= {localmax}_{w{W}} F(w;). \]
This also means that features are never detected in isolation, but by comparing neighborhoods of them.
The next difficulty is to guarantee that detection is co-variant with image transformations. Hence, if $u$ is the pose of a feature extracted from image $$, then the transformed pose $u' = w[u]$ must be detected in the transformed image $' = w[]$.
Since features are extracted in correspondence of the local maxima of the cornerness score, a sufficient condition is that corresponding features attain the same score in the two images:
\[ u{W}: F(u;) = F(w[u];w[]), {or} F(u;) = F(w u ; w^{-1}). \]
One simple way to satisfy this equation is to compute a cornerness score *after normalizing the image* by the inverse of the candidate feature pose warp $u$, as follows:
\[ F(u;) = F(1;u^{-1}[]) = F(1; u) = {F}( u), \]
where $1 = u^{-1} u$ is the identity transformation and ${F}$ is an arbitrary functional. Intuitively, co-variant detection is obtained by looking if the appearance of the feature resembles a corner only *after normalization*. Formally:
Concrete examples of the functional ${F}$ are given in Cornerness measures.
In the definition above, the cornenress functional ${F}$ is an arbitrary functional of the entire normalized image $u^{-1}[]$. In practice, one is always interested in detecting **local features** (at the very least because the image extent is finite).
This is easily obtained by considering a cornerness ${F}$ which only looks in a small region of the normalized image, usually corresponding to the extent of the canonical feature $R_0$ (e.g. a unit disc centered at the origin).
In this case the extent of the local feature in the original image is simply given by $R = u[R_0]$.
Practical detectors implement variants of the ideas above. Very often, for instance, detection is an iterative process, in which successive parameters of the pose of a feature are determined. For instance, it is typical to first detect the location and scale of a feature using a rotation-invariant cornerness score ${F}$. Once these two parameters are known, the rotation can be determined using a different score, sensitive to the orientation of the local image structures.
Certain detectors (such as Harris-Laplace and Hessian-Laplace) use even more sophisticated schemes, in which different scores are used to jointly (rather than in succession) different parameters of the pose of a feature, such as its translation and scale. While a formal treatment of these cases is possible as well, we point to the original papers.
Dealing with covariant interest point detector requires working a good deal with derivatives, convolutions, and transformations of images. The notation and fundamental properties of interest here are discussed next.
For the derivatives, we borrow the notation of [kinghorn96integrals]}. Let $f: {R}^m {R}^n, f()$ be a vector function. The derivative of the function with respect to $$ is given by its *Jacobian matrix* denoted by the symbol:
\[ { f}{ ^} = {bmatrix} { f_1}{x_1} & { f_1}{x_2} & \ { f_2}{x_1} & { f_2}{x_2} & \ & & \ {bmatrix}. \]
When the function $ f $ is scalar ($n=1$), the Jacobian is the same as the gradient of the function (or, in fact, its transpose). More precisely, the gradient $ f $ of $ f $ denotes the column vector of partial derivatives:
\[ f = { f}{ } = {bmatrix} { f}{ x_1} \ { f}{ x_2} \ {bmatrix}. \]
The second derivative $H_f $ of a scalar function $ f $, or Hessian, is denoted as
\[ H_f = { f}{ ^} = { f}{ ^} = {bmatrix} { f}{ x_1 x_1} & { f}{ x_1 x_2} & \ { f}{ x_2 x_1} & { f}{ x_2 x_2} & \ & & \ {bmatrix}. \]
The determinant of the Hessian is also known as Laplacian and denoted as
\[ f = {det} H_f = { f}{ x_1^2} + { f}{ x_2^2} + \]
In the following, we will often been interested in domain warpings $u: {R}^m {R}^n, u()$ of a function $f() $ and its effect on the derivatives of the function. The key transformation is the chain rule:
\[ { f u}{ ^} = ({ f}{ ^} u) { u}{ ^} \]
In particular, for an affine transformation $u = (A,T) : A + T$, one obtains the transformation rules:
\[ {align*} { f (A,T)}{ ^} &= ({ f}{ ^} (A,T))A, \ (f (A,T)) &= A^ ( f) (A,T), \ H_{f (A,T)} &= A^ (H_f (A,T)) A, \ (f (A,T)) &= (A)^2\, ( f) (A,T). {align*} \]
In practice, given an image $$ expressed in digital format, good derivative approximations can be computed only if the bandwidth of the image is limited and, in particular, compatible with the sampling density. Since it is unreasonable to expect real images to be band-limited, the bandwidth is artificially constrained by suitably smoothing the image prior to computing its derivatives. This is also interpreted as a form of regularization or as a way of focusing on the image content at a particular scale.
Formally, we will focus on Gaussian smoothing kernels. For the 2D case $^2$, the Gaussian kernel of covariance $$ is given by
\[ g_{}() = {1}{2 {}} (
The symbol $g_{^2}$ will be used to denote a Gaussian kernel with isotropic standard deviation $$, i.e. $ = ^2 I$. Given an image $$, the symbol $$ will be used to denote the image smoothed by the Gaussian kernel of parameter $$:
\[ () = (g_ * )() = {^m} g_( - ) ()\,d. \]
One advantage of Gaussian kernels is that they are (up to renormalization) closed under a linear warp:
\[ |A|\, g_ A = g_{A^{-1} A^{-}} \]
This also means that smoothing a warped image is the same as warping the result of smoothing the original image by a suitably adjusted Gaussian kernel:
\[ g_{} * ( (A,T)) = (g_{A A^} * ) (A,T). \]