Public Member Functions | Private Types | Private Member Functions | Private Attributes | Static Private Attributes | Friends | List of all members
GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t > Class Template Reference

Nearest-neighbor calculations. More...

#include <NearestNeighbor.hpp>

Public Member Functions

void Initialize (const std::vector< pos_t > &pts, const distfun_t &dist, int bucket=4)
 
void Load (std::istream &is, bool bin=true)
 
 NearestNeighbor ()
 
 NearestNeighbor (const std::vector< pos_t > &pts, const distfun_t &dist, int bucket=4)
 
int NumPoints () const
 
void ResetStatistics () const
 
void Save (std::ostream &os, bool bin=true) const
 
dist_t Search (const std::vector< pos_t > &pts, const distfun_t &dist, const pos_t &query, std::vector< int > &ind, int k=1, dist_t maxdist=std::numeric_limits< dist_t >::max(), dist_t mindist=-1, bool exhaustive=true, dist_t tol=0) const
 
void Statistics (int &setupcost, int &numsearches, int &searchcost, int &mincost, int &maxcost, double &mean, double &sd) const
 
void swap (NearestNeighbor &t)
 

Private Types

typedef std::pair< dist_t, intitem
 

Private Member Functions

int init (const std::vector< pos_t > &pts, const distfun_t &dist, int bucket, std::vector< Node > &tree, std::vector< item > &ids, int &cost, int l, int u, int vp)
 

Private Attributes

int _bucket
 
int _c1
 
int _cmax
 
int _cmin
 
int _cost
 
int _k
 
double _mc
 
int _numpoints
 
double _sc
 
std::vector< Node > _tree
 

Static Private Attributes

static const int maxbucket
 
static const int version = 1
 

Friends

std::ostream & operator<< (std::ostream &os, const NearestNeighbor &t)
 
std::istream & operator>> (std::istream &is, NearestNeighbor &t)
 

Detailed Description

template<typename dist_t, typename pos_t, class distfun_t>
class GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >

Nearest-neighbor calculations.

This class solves the nearest-neighbor problm using a vantage-point tree as described in nearest.

This class is templated so that it can handle arbitrary metric spaces as follows:

Template Parameters
dist_tthe type used for measuring distances; it can be a real or signed integer type; in typical geodetic applications, dist_t might be double.
pos_tthe type for specifying the positions of points; geodetic application might bundled the latitude and longitude into a std::pair<dist_t, dist_t>.
distfun_tthe type of a function object which takes takes two positions (of type pos_t) and returns the distance (of type dist_t); in geodetic applications, this might be a class which is constructed with a Geodesic object and which implements a member function with a signature dist_t operator() (const pos_t&, const pos_t&) const, which returns the geodesic distance between two points.
Note
The distance measure must satisfy the triangle inequality, $ d(a,c) \le d(a,b) + d(b,c) $ for all points a, b, c. The geodesic distance (given by Geodesic::Inverse) does, while the great ellipse distance and the rhumb line distance do not. If you use the ordinary Euclidean distance, i.e., $ \sqrt{(x_a-x_b)^2 + (y_a-y_b)^2} $ for two dimensions, don't be tempted to leave out the square root in the interests of "efficiency"; the squared distance does not satisfy the triangle inequality!

This is a "header-only" implementation and, as such, depends in a minimal way on the rest of GeographicLib (the only dependency is through the use of GEOGRAPHICLIB_STATIC_ASSERT and GeographicLib::GeographicErr for handling run-time and compile-time exceptions). Therefore, it is easy to extract this class from the rest of GeographicLib and use it as a stand-alone facility.

The dist_t type must support numeric_limits queries (specifically: is_signed, is_integer, max(), digits).

The NearestNeighbor object is constructed with a vector of points (type pos_t) and a distance function (type distfun_t). However the object does not store the points. When querying the object with Search(), it's necessary to supply the same vector of points and the same distance function.

There's no capability in this implementation to add or remove points from the set. Instead Initialize() should be called to re-initialize the object with the modified vector of points.

Because of the overhead in constructing a NearestNeighbor object for a large set of points, functions Save() and Load() are provided to save the object to an external file. operator<<(), operator>>() and Boost serialization can also be used to save and restore a NearestNeighbor object. This is illustrated in the example.

Example of use:

// Example of using the GeographicLib::NearestNeighbor class. WARNING: this
// creates a file, pointset.xml or pointset.txt, in the current directory.
// Read lat/lon locations from locations.txt and lat/lon queries from
// queries.txt. For each query print to standard output: the index for the
// closest location and the distance to it. Print statistics to standard error
// at the end.
#include <iostream>
#include <exception>
#include <vector>
#include <fstream>
#include <string>
#if !defined(GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION)
#define GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION 0
#endif
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
// If Boost serialization is available, use it.
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#endif
using namespace std;
using namespace GeographicLib;
// A structure to hold a geographic coordinate.
struct pos {
double _lat, _lon;
pos(double lat = 0, double lon = 0) : _lat(lat), _lon(lon) {}
};
// A class to compute the distance between 2 positions.
private:
Geodesic _geod;
public:
explicit DistanceCalculator(const Geodesic& geod) : _geod(geod) {}
double operator() (const pos& a, const pos& b) const {
double d;
_geod.Inverse(a._lat, a._lon, b._lat, b._lon, d);
if ( !(d >= 0) )
// Catch illegal positions which result in d = NaN
throw GeographicErr("distance doesn't satisfy d >= 0");
return d;
}
};
int main() {
try {
// Read in locations
vector<pos> locs;
double lat, lon;
string sa, sb;
{
ifstream is("locations.txt");
if (!is.good())
throw GeographicErr("locations.txt not readable");
while (is >> sa >> sb) {
DMS::DecodeLatLon(sa, sb, lat, lon);
locs.push_back(pos(lat, lon));
}
if (locs.size() == 0)
throw GeographicErr("need at least one location");
}
// Define a distance function object
// Create NearestNeighbor object
{
// Used saved object if it is available
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
ifstream is("pointset.xml");
if (is.good()) {
boost::archive::xml_iarchive ia(is);
ia >> BOOST_SERIALIZATION_NVP(pointset);
}
#else
ifstream is("pointset.txt");
if (is.good())
is >> pointset;
#endif
}
// Is the saved pointset up-to-date?
if (pointset.NumPoints() != int(locs.size())) {
// else initialize it
pointset.Initialize(locs, distance);
// and save it
#if GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION
ofstream os("pointset.xml");
if (!os.good())
throw GeographicErr("cannot write to pointset.xml");
boost::archive::xml_oarchive oa(os);
oa << BOOST_SERIALIZATION_NVP(pointset);
#else
ofstream os("pointset.txt");
if (!os.good())
throw GeographicErr("cannot write to pointset.txt");
os << pointset << "\n";
#endif
}
ifstream is("queries.txt");
double d;
int count = 0;
vector<int> k;
while (is >> sa >> sb) {
++count;
DMS::DecodeLatLon(sa, sb, lat, lon);
d = pointset.Search(locs, distance, pos(lat, lon), k);
if (k.size() != 1)
throw GeographicErr("unexpected number of results");
cout << k[0] << " " << d << "\n";
}
int setupcost, numsearches, searchcost, mincost, maxcost;
double mean, sd;
pointset.Statistics(setupcost, numsearches, searchcost,
mincost, maxcost, mean, sd);
int
totcost = setupcost + searchcost,
exhaustivecost = count * pointset.NumPoints();
cerr
<< "Number of distance calculations = " << totcost << "\n"
<< "With an exhaustive search = " << exhaustivecost << "\n"
<< "Ratio = " << double(totcost) / exhaustivecost << "\n"
<< "Efficiency improvement = "
<< 100 * (1 - double(totcost) / exhaustivecost) << "%\n";
}
catch (const exception& e) {
cerr << "Caught exception: " << e.what() << "\n";
return 1;
}
}

Definition at line 104 of file NearestNeighbor.hpp.

Member Typedef Documentation

template<typename dist_t, typename pos_t, class distfun_t>
typedef std::pair<dist_t, int> GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::item
private

Definition at line 609 of file NearestNeighbor.hpp.

Constructor & Destructor Documentation

template<typename dist_t, typename pos_t, class distfun_t>
GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NearestNeighbor ( )
inline

Default constructor for NearestNeighbor.

This is equivalent to specifying an empty set of points.

Definition at line 119 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NearestNeighbor ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
int  bucket = 4 
)
inline

Constructor for NearestNeighbor.

Parameters
[in]ptsa vector of points to include in the set.
[in]distthe distance function object.
[in]bucketthe size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(dist_t)/sizeof(int)] (default 4).
Exceptions
GeographicErrif the value of bucket is out of bounds or the size of pts is too big for an int.
std::bad_allocif memory for the tree can't be allocated.

pts may contain coincident points (i.e., the distance between them vanishes); these are treated as distinct.

The choice of bucket is a tradeoff between space and efficiency. A larger bucket decreases the size of the NearestNeighbor object which scales as pts.size() / max(1, bucket) and reduces the number of distance calculations to construct the object by log2(bucket) * pts.size(). However each search then requires about bucket additional distance calculations.

Warning
The distances computed by dist must satisfy the standard metric conditions. If not, the results are undefined. Neither the data in pts nor the query points should contain NaNs or infinities because such data violates the metric conditions.
The same arguments pts and dist must be provided to the Search() function.

Definition at line 150 of file NearestNeighbor.hpp.

Member Function Documentation

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::init ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
int  bucket,
std::vector< Node > &  tree,
std::vector< item > &  ids,
int cost,
int  l,
int  u,
int  vp 
)
inlineprivate

Definition at line 746 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Initialize ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
int  bucket = 4 
)
inline

Initialize or re-initialize NearestNeighbor.

Parameters
[in]ptsa vector of points to include in the tree.
[in]distthe distance function object.
[in]bucketthe size of the buckets at the leaf nodes; this must lie in [0, 2 + 4*sizeof(dist_t)/sizeof(int)] (default 4).
Exceptions
GeographicErrif the value of bucket is out of bounds or the size of pts is too big for an int.
std::bad_allocif memory for the tree can't be allocated.

See also the documentation on the constructor.

If an exception is thrown, the state of the NearestNeighbor is unchanged.

Definition at line 171 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Load ( std::istream &  is,
bool  bin = true 
)
inline

Read the object from an I/O stream.

Parameters
[in,out]isthe stream to read from
[in]binif true (the default) load in binary mode.
Exceptions
GeographicErrif the state read from is is illegal.
std::bad_allocif memory for the tree can't be allocated.

The counters tracking the statistics of searches are reset by this operation. Binary data must have been saved on a machine with the same architecture. If an exception is thrown, the state of the NearestNeighbor is unchanged.

Note
Boost serialization can also be used to save and restore a NearestNeighbor object. This requires that the GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION macro be defined.
Warning
The same arguments pts and dist used for initialization must be provided to the Search() function.

Definition at line 453 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::NumPoints ( ) const
inline
Returns
the total number of points in the set.

Definition at line 356 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::ResetStatistics ( ) const
inline

Reset the counters for the accumulated statistics on the searches so far.

Definition at line 600 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Save ( std::ostream &  os,
bool  bin = true 
) const
inline

Write the object to an I/O stream.

Parameters
[in,out]osthe stream to write to.
[in]binif true (the default) save in binary mode.
Exceptions
std::bad_allocif memory for the string representation of the object can't be allocated.

The counters tracking the statistics of searches are not saved; however the initializtion cost is saved. The format of the binary saves is not portable.

Note
Boost serialization can also be used to save and restore a NearestNeighbor object. This requires that the GEOGRAPHICLIB_HAVE_BOOST_SERIALIZATION macro be defined.

Definition at line 375 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
dist_t GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Search ( const std::vector< pos_t > &  pts,
const distfun_t &  dist,
const pos_t &  query,
std::vector< int > &  ind,
int  k = 1,
dist_t  maxdist = std::numeric_limits<dist_t>::max(),
dist_t  mindist = -1,
bool  exhaustive = true,
dist_t  tol = 0 
) const
inline

Search the NearestNeighbor.

Parameters
[in]ptsthe vector of points used for initialization.
[in]distthe distance function object used for initialization.
[in]querythe query point.
[out]inda vector of indices to the closest points found.
[in]kthe number of points to search for (default = 1).
[in]maxdistonly return points with distances of maxdist or less from query (default is the maximum dist_t).
[in]mindistonly return points with distances of more than mindist from query (default = −1).
[in]exhaustivewhether to do an exhaustive search (default true).
[in]tolthe tolerance on the results (default 0).
Returns
the distance to the closest point found (−1 if no points are found).
Exceptions
GeographicErrif pts has a different size from that used to construct the object.

The indices returned in ind are sorted by distance from query (closest first).

The simplest invocation is with just the 4 non-optional arguments. This returns the closest distance and the index to the closest point in ind0. If there are several points equally close, then ind0 gives the index of an arbirary one of them. If there's no closest point (because the set of points is empty), then ind is empty and −1 is returned.

With exhaustive = true and tol = 0 (their default values), this finds the indices of k closest neighbors to query whose distances to query are in (mindist, maxdist]. If mindist and maxdist have their default values, then these bounds have no effect. If query is one of the points in the tree, then set mindist = 0 to prevent this point (and other coincident points) from being returned.

If exhaustive = false, exit as soon as k results satisfying the distance criteria are found. If less than k results are returned then the search was exhaustive even if exhaustive = false.

If tol is positive, do an approximate search; in this case the results are to be interpreted as follows: if the k'th distance is dk, then all results with distances less than or equal dktol are correct; all others are suspect — there may be other closer results with distances greater or equal to dktol. If less than k results are found, then the search is exact.

mindist should be used to exclude a "small" neighborhood of the query point (relative to the average spacing of the data). If mindist is large, the efficiency of the search deteriorates.

Note
Only the shortest distance is returned (as as the function value). The distances to other points (indexed by indj for j > 0) can be found by invoking dist again.
Warning
The arguments pts and dist must be identical to those used to initialize the NearestNeighbor; if not, this function will return some meaningless result (however, if the size of pts is wrong, this function throw an exception).
The query point cannot be a NaN or infinite because then the metric conditions are violated.

Definition at line 259 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::Statistics ( int setupcost,
int numsearches,
int searchcost,
int mincost,
int maxcost,
double &  mean,
double &  sd 
) const
inline

The accumulated statistics on the searches so far.

Parameters
[out]setupcostthe cost of initializing the NearestNeighbor.
[out]numsearchesthe number of calls to Search().
[out]searchcostthe total cost of the calls to Search().
[out]mincostthe minimum cost of a Search().
[out]maxcostthe maximum cost of a Search().
[out]meanthe mean cost of a Search().
[out]sdthe standard deviation in the cost of a Search().

Here "cost" measures the number of distance calculations needed. Note that the accumulation of statistics is not thread safe.

Definition at line 588 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
void GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::swap ( NearestNeighbor< dist_t, pos_t, distfun_t > &  t)
inline

Swap with another NearestNeighbor object.

Parameters
[in,out]tthe NearestNeighbor object to swap with.

Definition at line 561 of file NearestNeighbor.hpp.

Friends And Related Function Documentation

template<typename dist_t, typename pos_t, class distfun_t>
std::ostream& operator<< ( std::ostream &  os,
const NearestNeighbor< dist_t, pos_t, distfun_t > &  t 
)
friend

Write the object to stream os as text.

Parameters
[in,out]osthe output stream.
[in]tthe NearestNeighbor object to be saved.
Exceptions
std::bad_allocif memory for the string representation of the object can't be allocated.

Definition at line 542 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
std::istream& operator>> ( std::istream &  is,
NearestNeighbor< dist_t, pos_t, distfun_t > &  t 
)
friend

Read the object from stream is as text.

Parameters
[in,out]isthe input stream.
[out]tthe NearestNeighbor object to be loaded.
Exceptions
GeographicErrif the state read from is is illegal.
std::bad_allocif memory for the tree can't be allocated.

Definition at line 553 of file NearestNeighbor.hpp.

Member Data Documentation

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_bucket
private

Definition at line 740 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_c1
mutableprivate

Definition at line 744 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_cmax
mutableprivate

Definition at line 744 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_cmin
mutableprivate

Definition at line 744 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_cost
private

Definition at line 740 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_k
mutableprivate

Definition at line 744 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
double GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_mc
mutableprivate

Definition at line 743 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_numpoints
private

Definition at line 740 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
double GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_sc
mutableprivate

Definition at line 743 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
std::vector<Node> GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::_tree
private

Definition at line 741 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
const int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::maxbucket
staticprivate
Initial value:
=
(2 + ((4 * sizeof(dist_t)) / sizeof(int) >= 2 ?
(4 * sizeof(dist_t)) / sizeof(int) : 2))

Definition at line 109 of file NearestNeighbor.hpp.

template<typename dist_t, typename pos_t, class distfun_t>
const int GeographicLib::NearestNeighbor< dist_t, pos_t, distfun_t >::version = 1
staticprivate

Definition at line 106 of file NearestNeighbor.hpp.


The documentation for this class was generated from the following file:


gtsam
Author(s):
autogenerated on Sat May 8 2021 02:58:00