vt::Database Class Reference

Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents. More...

#include <database.h>

List of all members.

Classes

struct  WordFrequency

Public Member Functions

void computeTfIdfWeights (float default_weight=1.0f)
 Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database.
 Database (uint32_t num_words=0)
 Constructor.
void find (const std::vector< Word > &document, size_t N, std::vector< Match > &matches) const
 Find the top N matches in the database for the query document.
DocId findAndInsert (const std::vector< Word > &document, size_t N, std::vector< Match > &matches)
 Find the top N matches, then insert the query document.
DocId insert (const std::vector< Word > &document)
 Insert a new document.
void loadWeights (const std::string &file)
 Load the vocabulary word weights from a file.
void saveWeights (const std::string &file) const
 Save the vocabulary word weights to a file.

Private Types

typedef std::map< Word, float > DocumentVector
typedef std::vector
< WordFrequency
InvertedFile

Private Member Functions

void computeVector (const std::vector< Word > &document, DocumentVector &v) const

Static Private Member Functions

static void normalize (DocumentVector &v)
static float sparseDistance (const DocumentVector &v1, const DocumentVector &v2)

Private Attributes

std::vector< DocumentVectordatabase_vectors_
std::vector< InvertedFileword_files_
std::vector< float > word_weights_

Detailed Description

Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents.

Definition at line 38 of file database.h.


Member Typedef Documentation

typedef std::map<Word, float> vt::Database::DocumentVector [private]
Todo:
Use sorted vector?

Definition at line 108 of file database.h.

typedef std::vector<WordFrequency> vt::Database::InvertedFile [private]

Definition at line 104 of file database.h.


Constructor & Destructor Documentation

vt::Database::Database ( uint32_t  num_words = 0  ) 

Constructor.

If computing weights for a new vocabulary, num_words should be the size of the vocabulary. If calling loadWeights(), it can be left zero.

Definition at line 4 of file database.cpp.


Member Function Documentation

void vt::Database::computeTfIdfWeights ( float  default_weight = 1.0f  ) 

Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database.

Parameters:
default_weight The default weight of a word that appears in none of the training documents.

Definition at line 60 of file database.cpp.

void vt::Database::computeVector ( const std::vector< Word > &  document,
DocumentVector v 
) const [private]

Definition at line 99 of file database.cpp.

void vt::Database::find ( const std::vector< Word > &  document,
size_t  N,
std::vector< Match > &  matches 
) const

Find the top N matches in the database for the query document.

Parameters:
document The query document, a set of quantized words.
N The number of matches to return.
[out] matches IDs and scores for the top N matching database documents.

Todo:
Try only computing distances against documents sharing at least one word

Definition at line 32 of file database.cpp.

DocId vt::Database::findAndInsert ( const std::vector< Word > &  document,
size_t  N,
std::vector< Match > &  matches 
)

Find the top N matches, then insert the query document.

This is equivalent to calling find() followed by insert(), but may be more efficient.

Parameters:
document The document to match then insert, a set of quantized words.
N The number of matches to return.
[out] matches IDs and scores for the top N matching database documents.

Todo:
Can this be accelerated? Could iterate over words only once?

Definition at line 53 of file database.cpp.

DocId vt::Database::insert ( const std::vector< Word > &  document  ) 

Insert a new document.

Parameters:
document The set of quantized words in a document/image.
Returns:
An ID representing the inserted document.

Todo:
Evaluate whether sorting words makes much difference in speed

Definition at line 10 of file database.cpp.

void vt::Database::loadWeights ( const std::string &  file  ) 

Load the vocabulary word weights from a file.

Definition at line 81 of file database.cpp.

void vt::Database::normalize ( DocumentVector v  )  [static, private]

Definition at line 108 of file database.cpp.

void vt::Database::saveWeights ( const std::string &  file  )  const

Save the vocabulary word weights to a file.

Definition at line 73 of file database.cpp.

float vt::Database::sparseDistance ( const DocumentVector v1,
const DocumentVector v2 
) [static, private]

Definition at line 118 of file database.cpp.


Member Data Documentation

Definition at line 112 of file database.h.

std::vector<InvertedFile> vt::Database::word_files_ [private]

Definition at line 110 of file database.h.

std::vector<float> vt::Database::word_weights_ [private]

Definition at line 111 of file database.h.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs


vocabulary_tree
Author(s): Patrick Mihelich
autogenerated on Fri Jan 11 09:14:12 2013