Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents. More...
#include <database.h>
Classes | |
struct | WordFrequency |
Public Member Functions | |
void | computeTfIdfWeights (float default_weight=1.0f) |
Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database. | |
Database (uint32_t num_words=0) | |
Constructor. | |
void | find (const std::vector< Word > &document, size_t N, std::vector< Match > &matches) const |
Find the top N matches in the database for the query document. | |
DocId | findAndInsert (const std::vector< Word > &document, size_t N, std::vector< Match > &matches) |
Find the top N matches, then insert the query document. | |
DocId | insert (const std::vector< Word > &document) |
Insert a new document. | |
void | loadWeights (const std::string &file) |
Load the vocabulary word weights from a file. | |
void | saveWeights (const std::string &file) const |
Save the vocabulary word weights to a file. | |
Private Types | |
typedef std::map< Word, float > | DocumentVector |
typedef std::vector < WordFrequency > | InvertedFile |
Private Member Functions | |
void | computeVector (const std::vector< Word > &document, DocumentVector &v) const |
Static Private Member Functions | |
static void | normalize (DocumentVector &v) |
static float | sparseDistance (const DocumentVector &v1, const DocumentVector &v2) |
Private Attributes | |
std::vector< DocumentVector > | database_vectors_ |
std::vector< InvertedFile > | word_files_ |
std::vector< float > | word_weights_ |
Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents.
Definition at line 40 of file database.h.
typedef std::map<Word, float> vt::Database::DocumentVector [private] |
Definition at line 110 of file database.h.
typedef std::vector<WordFrequency> vt::Database::InvertedFile [private] |
Definition at line 106 of file database.h.
vt::Database::Database | ( | uint32_t | num_words = 0 | ) |
Constructor.
If computing weights for a new vocabulary, num_words
should be the size of the vocabulary. If calling loadWeights(), it can be left zero.
Definition at line 11 of file database.cpp.
void vt::Database::computeTfIdfWeights | ( | float | default_weight = 1.0f | ) |
Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database.
default_weight | The default weight of a word that appears in none of the training documents. |
Definition at line 67 of file database.cpp.
void vt::Database::computeVector | ( | const std::vector< Word > & | document, |
DocumentVector & | v | ||
) | const [private] |
Definition at line 106 of file database.cpp.
void vt::Database::find | ( | const std::vector< Word > & | document, |
size_t | N, | ||
std::vector< Match > & | matches | ||
) | const |
Find the top N matches in the database for the query document.
document | The query document, a set of quantized words. | |
N | The number of matches to return. | |
[out] | matches | IDs and scores for the top N matching database documents. |
Definition at line 39 of file database.cpp.
DocId vt::Database::findAndInsert | ( | const std::vector< Word > & | document, |
size_t | N, | ||
std::vector< Match > & | matches | ||
) |
Find the top N matches, then insert the query document.
This is equivalent to calling find() followed by insert(), but may be more efficient.
document | The document to match then insert, a set of quantized words. | |
N | The number of matches to return. | |
[out] | matches | IDs and scores for the top N matching database documents. |
Definition at line 60 of file database.cpp.
DocId vt::Database::insert | ( | const std::vector< Word > & | document | ) |
Insert a new document.
document | The set of quantized words in a document/image. |
Definition at line 17 of file database.cpp.
void vt::Database::loadWeights | ( | const std::string & | file | ) |
Load the vocabulary word weights from a file.
Definition at line 88 of file database.cpp.
void vt::Database::normalize | ( | DocumentVector & | v | ) | [static, private] |
Definition at line 115 of file database.cpp.
void vt::Database::saveWeights | ( | const std::string & | file | ) | const |
Save the vocabulary word weights to a file.
Definition at line 80 of file database.cpp.
float vt::Database::sparseDistance | ( | const DocumentVector & | v1, |
const DocumentVector & | v2 | ||
) | [static, private] |
Definition at line 125 of file database.cpp.
std::vector<DocumentVector> vt::Database::database_vectors_ [private] |
Definition at line 114 of file database.h.
std::vector<InvertedFile> vt::Database::word_files_ [private] |
Definition at line 112 of file database.h.
std::vector<float> vt::Database::word_weights_ [private] |
Definition at line 113 of file database.h.