Gaussian scale space fundamentals

This page discusses the notion of *Gaussian scale space* and the relative data structure. For the C API see scalespace.h and Getting started.

A *scale space* is representation of an image at multiple resolution levels. An image is a function $(x,y)$ of two coordinates $x$, $y$; the scale space $(x,y,)$ adds a third coordinate $$ indexing the *scale*. Here the focus is the Gaussian scale space, where the image $(x,y,)$ is obtained by smoothing $(x,y)$ by a Gaussian kernel of isotropic standard deviation $$.

Scale space definition

Formally, the *Gaussian scale space* of an image $(x,y)$ is defined as

\[ (x,y,) = [g_{} * ](x,y,) \]

where $g_$ denotes a 2D Gaussian kernel of isotropic standard deviation $$:

\[ g_{}(x,y) = {1}{2^2} (

An important detail is that the algorithm computing the scale space assumes that the input image $(x,y)$ is pre-smoothed, roughly capturing the effect of the finite pixel size in a CCD. This is modelled by assuming that the input is not $(x,y)$, but $(x,y,)$, where $$ is a *nominal smoothing*, usually taken to be 0.5 (half a pixel standard deviation). This also means that $ = = 0.5$ is the *finest scale* that can actually be computed.

The scale space structure stores samples of the function $(x,y,)$. The density of the sampling of the spatial coordinates $x$ and $y$ is adjusted as a function of the scale $$, corresponding to the intuition that images at a coarse resolution can be sampled more coarsely without loss of information. Thus, the scale space has the structure of a *pyramid*: a collection of digital images sampled at progressively coarser spatial resolution and hence of progressively smaller size (in pixels).

The following figure illustrates the scale space pyramid structure:

scalespace-basic.png
A scalespace structure with 2 octaves and S=3 subdivisions per octave

The pyramid is organised in a number of *octaves*, indexed by a parameter `o`. Each octave is further subdivided into *sublevels*, indexed by a parameter `s`. These are related to the scale $$ by the equation

\[ (s,o) = 2^{ o + {s}{{octaveResolution}}} \]

where `octaveResolution` is the resolution of the octave subsampling $$ is the *base smoothing*.

At each octave the spatial resolution is doubled, in the sense that samples are take with a step of \[ {step} = 2^o. \] Hence, denoting as `level[i,j]` the corresponding samples, one has $(x,y,) = {level}[i,j]$, where \[ (x,y) = (i,j) {step}, = (o,s), 0 i < {lwidth}, 0 j < {lheight}, \] where \[ {lwidth} = {{width}}{2^{o}}, {lheight} = {{height}}{2^{o}}. \]

Scale space geometry

In addition to the parameters discussed above, the geometry of the data stored in a scale space structure depends on the range of allowable octaves `o` and scale sublevels `s`.

While `o` may range in any reasonable value given the size of the input image `image`, usually its minimum value is either 0 or -1. The latter corresponds to doubling the resolution of the image in the first octave of the scale space and it is often used in feature extraction. While there is no information added to the image by upsampling in this manner, fine scale filters, including derivative filters, are much easier to compute by upsalmpling first. The maximum practical value is dictated by the image resolution, as it should be $2^o\{{width},{height}



libvlfeat
Author(s): Andrea Vedaldi
autogenerated on Thu Jun 6 2019 20:25:52