Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents. More...
#include <database.h>
Classes | |
struct | WordFrequency |
Public Member Functions | |
void | computeTfIdfWeights (float default_weight=1.0f) |
Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database. | |
Database (uint32_t num_words=0) | |
Constructor. | |
void | find (const std::vector< Word > &document, size_t N, std::vector< Match > &matches) const |
Find the top N matches in the database for the query document. | |
DocId | findAndInsert (const std::vector< Word > &document, size_t N, std::vector< Match > &matches) |
Find the top N matches, then insert the query document. | |
DocId | insert (const std::vector< Word > &document) |
Insert a new document. | |
void | loadWeights (const std::string &file) |
Load the vocabulary word weights from a file. | |
void | saveWeights (const std::string &file) const |
Save the vocabulary word weights to a file. | |
Private Types | |
typedef std::map< Word, float > | DocumentVector |
typedef std::vector < WordFrequency > | InvertedFile |
Private Member Functions | |
void | computeVector (const std::vector< Word > &document, DocumentVector &v) const |
Static Private Member Functions | |
static void | normalize (DocumentVector &v) |
static float | sparseDistance (const DocumentVector &v1, const DocumentVector &v2) |
Private Attributes | |
std::vector< DocumentVector > | database_vectors_ |
std::vector< InvertedFile > | word_files_ |
std::vector< float > | word_weights_ |
Class for efficiently matching a bag-of-words representation of a document (image) against a database of known documents.
Definition at line 38 of file database.h.
typedef std::map<Word, float> vt::Database::DocumentVector [private] |
Definition at line 108 of file database.h.
typedef std::vector<WordFrequency> vt::Database::InvertedFile [private] |
Definition at line 104 of file database.h.
vt::Database::Database | ( | uint32_t | num_words = 0 |
) |
Constructor.
If computing weights for a new vocabulary, num_words
should be the size of the vocabulary. If calling loadWeights(), it can be left zero.
Definition at line 4 of file database.cpp.
void vt::Database::computeTfIdfWeights | ( | float | default_weight = 1.0f |
) |
Compute the TF-IDF weights of all the words. To be called after inserting a corpus of training examples into the database.
default_weight | The default weight of a word that appears in none of the training documents. |
Definition at line 60 of file database.cpp.
void vt::Database::computeVector | ( | const std::vector< Word > & | document, | |
DocumentVector & | v | |||
) | const [private] |
Definition at line 99 of file database.cpp.
void vt::Database::find | ( | const std::vector< Word > & | document, | |
size_t | N, | |||
std::vector< Match > & | matches | |||
) | const |
Find the top N matches in the database for the query document.
document | The query document, a set of quantized words. | |
N | The number of matches to return. | |
[out] | matches | IDs and scores for the top N matching database documents. |
Definition at line 32 of file database.cpp.
DocId vt::Database::findAndInsert | ( | const std::vector< Word > & | document, | |
size_t | N, | |||
std::vector< Match > & | matches | |||
) |
Find the top N matches, then insert the query document.
This is equivalent to calling find() followed by insert(), but may be more efficient.
document | The document to match then insert, a set of quantized words. | |
N | The number of matches to return. | |
[out] | matches | IDs and scores for the top N matching database documents. |
Definition at line 53 of file database.cpp.
Insert a new document.
document | The set of quantized words in a document/image. |
Definition at line 10 of file database.cpp.
void vt::Database::loadWeights | ( | const std::string & | file | ) |
Load the vocabulary word weights from a file.
Definition at line 81 of file database.cpp.
void vt::Database::normalize | ( | DocumentVector & | v | ) | [static, private] |
Definition at line 108 of file database.cpp.
void vt::Database::saveWeights | ( | const std::string & | file | ) | const |
Save the vocabulary word weights to a file.
Definition at line 73 of file database.cpp.
float vt::Database::sparseDistance | ( | const DocumentVector & | v1, | |
const DocumentVector & | v2 | |||
) | [static, private] |
Definition at line 118 of file database.cpp.
std::vector<DocumentVector> vt::Database::database_vectors_ [private] |
Definition at line 112 of file database.h.
std::vector<InvertedFile> vt::Database::word_files_ [private] |
Definition at line 110 of file database.h.
std::vector<float> vt::Database::word_weights_ [private] |
Definition at line 111 of file database.h.