This page describes the *Vector of Locally Aggregated Descriptors* (VLAD) image encoding of [jegou10aggregating]}. See Vector of Locally Aggregated Descriptors (VLAD) encoding for an overview of the C API.
VLAD is a *feature encoding and pooling* method, similar to Fisher vectors. VLAD encodes a set of local feature descriptors $I=(,,)$ extracted from an image using a dictionary built using a clustering method such as Gaussian Mixture Models (GMM) or K-means clustering. Let $q_{ik}$ be the strength of the association of data vector $$ to cluster $$, such that $q_{ik} 0$ and ${k=1}^K q_{ik} = 1$. The association may be either soft (e.g. obtained as the posterior probabilities of the GMM clusters) or hard (e.g. obtained by vector quantization with K-means).
$$ are the cluster *means*, vectors of the same dimension as the data $$. VLAD encodes feature $$ by considering the *residuals* \[ = {i=1}^{N} q_{ik} ({i} - ). \] The residulas are stacked together to obtain the vector \[ (I) = {bmatrix} \ \ {bmatrix} \]
Before the VLAD encoding is used it is usually normalized, as explained VLAD normalization next.
VLFeat VLAD implementation supports a number of different normalization strategies. These are optionally applied in this order: