This page discusses the Fisher Kernels (FK) of [jaakkola98exploiting]} and shows how the FV of [perronnin06fisher]} can be derived from it as a special case. The FK induces a similarity measures between data points $$ and $'$ from a parametric generative model $p(|)$ of the data. The parameter $$ of the model is selected to fit the a-priori distribution of the data, and is usually the Maximum Likelihood (MLE) estimate obtained from a set of training examples. Once the generative model is learned, each particular datum $$ is represented by looking at how it affects the MLE parameter estimate. This effect is measured by computing the gradient of the log-likelihood term corresponding to $$:
\[ () = p(|) \]
The vectors $()$ should be appropriately scaled before they can be meaningfully compared. This is obtained by *whitening* the data by multiplying the vectors by the inverse of the square root of their covariance matrix*. The covariance matrix can be obtained from the generative model $p(|)$ itself. Since $$ is the ML parameter and $()$ is the gradient of the log-likelihood function, its expected value $E[()]$ is zero. Thus, since the vectors are already centered, their covariance matrix is simply:
\[ H = E_{ p(|)} [() ()^] \]
Note that $H$ is also the *Fisher information matrix* of the model. The final FV encoding $()$ is given by the whitened gradient of the log-likelihood function, i.e.:
\[ () = H^{-{1}{2}} p(|). \]
Taking the inner product of two such vectors yields the *Fisher kernel*:
\[ K(,') = (),(') = p(|)^ H^{-1} p('|). \]