Classes | Public Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes
vt::Database Class Reference

Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents. More...

#include <database.h>

List of all members.

Classes

struct  WordFrequency

Public Member Functions

void computeTfIdfWeights (float default_weight=1.0f)
 Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database.
 Database (uint32_t num_words=0)
 Constructor.
void find (const std::vector< Word > &document, size_t N, std::vector< Match > &matches) const
 Find the top N matches in the database for the query document.
DocId findAndInsert (const std::vector< Word > &document, size_t N, std::vector< Match > &matches)
 Find the top N matches, then insert the query document.
DocId insert (const std::vector< Word > &document)
 Insert a new document.
void loadWeights (const std::string &file)
 Load the vocabulary word weights from a file.
void saveWeights (const std::string &file) const
 Save the vocabulary word weights to a file.

Private Types

typedef std::map< Word, float > DocumentVector
typedef std::vector
< WordFrequency
InvertedFile

Private Member Functions

void computeVector (const std::vector< Word > &document, DocumentVector &v) const

Static Private Member Functions

static void normalize (DocumentVector &v)
static float sparseDistance (const DocumentVector &v1, const DocumentVector &v2)

Private Attributes

std::vector< DocumentVectordatabase_vectors_
std::vector< InvertedFileword_files_
std::vector< float > word_weights_

Detailed Description

Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents.

Definition at line 40 of file database.h.


Member Typedef Documentation

typedef std::map<Word, float> vt::Database::DocumentVector [private]
Todo:
Use sorted vector?

Definition at line 110 of file database.h.

typedef std::vector<WordFrequency> vt::Database::InvertedFile [private]

Definition at line 106 of file database.h.


Constructor & Destructor Documentation

vt::Database::Database ( uint32_t  num_words = 0)

Constructor.

If computing weights for a new vocabulary, num_words should be the size of the vocabulary. If calling loadWeights(), it can be left zero.

Definition at line 11 of file database.cpp.


Member Function Documentation

void vt::Database::computeTfIdfWeights ( float  default_weight = 1.0f)

Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database.

Parameters:
default_weightThe default weight of a word that appears in none of the training documents.

Definition at line 67 of file database.cpp.

void vt::Database::computeVector ( const std::vector< Word > &  document,
DocumentVector v 
) const [private]

Definition at line 106 of file database.cpp.

void vt::Database::find ( const std::vector< Word > &  document,
size_t  N,
std::vector< Match > &  matches 
) const

Find the top N matches in the database for the query document.

Parameters:
documentThe query document, a set of quantized words.
NThe number of matches to return.
[out]matchesIDs and scores for the top N matching database documents.
Todo:
Try only computing distances against documents sharing at least one word

Definition at line 39 of file database.cpp.

DocId vt::Database::findAndInsert ( const std::vector< Word > &  document,
size_t  N,
std::vector< Match > &  matches 
)

Find the top N matches, then insert the query document.

This is equivalent to calling find() followed by insert(), but may be more efficient.

Parameters:
documentThe document to match then insert, a set of quantized words.
NThe number of matches to return.
[out]matchesIDs and scores for the top N matching database documents.
Todo:
Can this be accelerated? Could iterate over words only once?

Definition at line 60 of file database.cpp.

DocId vt::Database::insert ( const std::vector< Word > &  document)

Insert a new document.

Parameters:
documentThe set of quantized words in a document/image.
Returns:
An ID representing the inserted document.
Todo:
Evaluate whether sorting words makes much difference in speed

Definition at line 17 of file database.cpp.

void vt::Database::loadWeights ( const std::string &  file)

Load the vocabulary word weights from a file.

Definition at line 88 of file database.cpp.

void vt::Database::normalize ( DocumentVector v) [static, private]

Definition at line 115 of file database.cpp.

void vt::Database::saveWeights ( const std::string &  file) const

Save the vocabulary word weights to a file.

Definition at line 80 of file database.cpp.

float vt::Database::sparseDistance ( const DocumentVector v1,
const DocumentVector v2 
) [static, private]

Definition at line 125 of file database.cpp.


Member Data Documentation

Definition at line 114 of file database.h.

std::vector<InvertedFile> vt::Database::word_files_ [private]

Definition at line 112 of file database.h.

std::vector<float> vt::Database::word_weights_ [private]

Definition at line 113 of file database.h.


The documentation for this class was generated from the following files:


vocabulary_tree
Author(s): Patrick Mihelich
autogenerated on Thu Jan 2 2014 12:12:26