mlpack  2.0.1
Public Member Functions | Private Member Functions | Private Attributes | List of all members
mlpack::neighbor::LSHSearch< SortPolicy > Class Template Reference

The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries. More...

Public Member Functions

 LSHSearch (const arma::mat &referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)
 This function initializes the LSH class. More...
 
 LSHSearch ()
 Create an untrained LSH model. More...
 
 ~LSHSearch ()
 Clean memory. More...
 
size_t BucketSize () const
 Get the bucket size of the second hash. More...
 
size_t DistanceEvaluations () const
 Return the number of distance evaluations performed. More...
 
size_t & DistanceEvaluations ()
 Modify the number of distance evaluations performed. More...
 
size_t NumProjections () const
 Get the number of projections. More...
 
const arma::mat & Offsets () const
 Get the offsets 'b' for each of the projections. (One 'b' per column.) More...
 
const arma::mat & Projection (const size_t i) const
 Get the projection matrix of the given table. More...
 
const arma::mat & ReferenceSet () const
 Return the reference dataset. More...
 
void Search (const arma::mat &querySet, const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0)
 Compute the nearest neighbors of the points in the given query set and store the output in the given matrices. More...
 
void Search (const size_t k, arma::Mat< size_t > &resultingNeighbors, arma::mat &distances, const size_t numTablesToSearch=0)
 Compute the nearest neighbors and store the output in the given matrices. More...
 
const arma::Mat< size_t > & SecondHashTable () const
 Get the second hash table. More...
 
const arma::vec & SecondHashWeights () const
 Get the weights of the second hash. More...
 
template<typename Archive >
void Serialize (Archive &ar, const unsigned int)
 Serialize the LSH model. More...
 
void Train (const arma::mat &referenceSet, const size_t numProj, const size_t numTables, const double hashWidth=0.0, const size_t secondHashSize=99901, const size_t bucketSize=500)
 Train the LSH model on the given dataset. More...
 

Private Member Functions

void BaseCase (const size_t queryIndex, const size_t referenceIndex, arma::Mat< size_t > &neighbors, arma::mat &distances) const
 This is a helper function that computes the distance of the query to the neighbor candidates and appropriately stores the best 'k' candidates. More...
 
void BaseCase (const size_t queryIndex, const size_t referenceIndex, const arma::mat &querySet, arma::Mat< size_t > &neighbors, arma::mat &distances) const
 This is a helper function that computes the distance of the query to the neighbor candidates and appropriately stores the best 'k' candidates. More...
 
void BuildHash ()
 This function builds a hash table with two levels of hashing as presented in the paper. More...
 
void InsertNeighbor (arma::mat &distances, arma::Mat< size_t > &neighbors, const size_t queryIndex, const size_t pos, const size_t neighbor, const double distance) const
 This is a helper function that efficiently inserts better neighbor candidates into an existing set of neighbor candidates. More...
 
template<typename VecType >
void ReturnIndicesFromTable (const VecType &queryPoint, arma::uvec &referenceIndices, size_t numTablesToSearch) const
 This function takes a query and hashes it into each of the hash tables to get keys for the query and then the key is hashed to a bucket of the second hash table and all the points (if any) in those buckets are collected as the potential neighbor candidates. More...
 

Private Attributes

arma::Col< size_t > bucketContentSize
 The number of elements present in each hash bucket; should be secondHashSize. More...
 
arma::Col< size_t > bucketRowInHashTable
 For a particular hash value, points to the row in secondHashTable corresponding to this value. More...
 
size_t bucketSize
 The bucket size of the second hash. More...
 
size_t distanceEvaluations
 The number of distance evaluations. More...
 
double hashWidth
 The hash width. More...
 
size_t numProj
 The number of projections. More...
 
size_t numTables
 The number of hash tables. More...
 
arma::mat offsets
 The list of the offsets 'b' for each of the projection for each table. More...
 
bool ownsSet
 If true, we own the reference set. More...
 
std::vector< arma::mat > projections
 The std::vector containing the projection matrix of each table. More...
 
const arma::mat * referenceSet
 Reference dataset. More...
 
size_t secondHashSize
 The big prime representing the size of the second hash. More...
 
arma::Mat< size_t > secondHashTable
 The final hash table; should be (< secondHashSize) x bucketSize. More...
 
arma::vec secondHashWeights
 The weights of the second hash. More...
 

Detailed Description

template<typename SortPolicy = NearestNeighborSort>
class mlpack::neighbor::LSHSearch< SortPolicy >

The LSHSearch class; this class builds a hash on the reference set and uses this hash to compute the distance-approximate nearest-neighbors of the given queries.

Template Parameters
SortPolicyThe sort policy for distances; see NearestNeighborSort.

Definition at line 51 of file lsh_search.hpp.

Constructor & Destructor Documentation

template<typename SortPolicy = NearestNeighborSort>
mlpack::neighbor::LSHSearch< SortPolicy >::LSHSearch ( const arma::mat &  referenceSet,
const size_t  numProj,
const size_t  numTables,
const double  hashWidth = 0.0,
const size_t  secondHashSize = 99901,
const size_t  bucketSize = 500 
)

This function initializes the LSH class.

It builds the hash on the reference set with 2-stable distributions. See the individual functions performing the hashing for details on how the hashing is done.

Parameters
referenceSetSet of reference points and the set of queries.
numProjNumber of projections in each hash table (anything between 10-50 might be a decent choice).
numTablesTotal number of hash tables (anything between 10-20 should suffice).
hashWidthThe width of hash for every table. If 0 (the default) is provided, then the hash width is automatically obtained by computing the average pairwise distance of 25 pairs. This should be a reasonable upper bound on the nearest-neighbor distance in general.
secondHashSizeThe size of the second hash table. This should be a large prime number.
bucketSizeThe size of the bucket in the second hash table. This is the maximum number of points that can be hashed into single bucket. Default values are already provided here.
template<typename SortPolicy = NearestNeighborSort>
mlpack::neighbor::LSHSearch< SortPolicy >::LSHSearch ( )

Create an untrained LSH model.

Be sure to call Train() before calling Search(); otherwise, an exception will be thrown when Search() is called.

template<typename SortPolicy = NearestNeighborSort>
mlpack::neighbor::LSHSearch< SortPolicy >::~LSHSearch ( )

Clean memory.

Member Function Documentation

template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::BaseCase ( const size_t  queryIndex,
const size_t  referenceIndex,
arma::Mat< size_t > &  neighbors,
arma::mat &  distances 
) const
private

This is a helper function that computes the distance of the query to the neighbor candidates and appropriately stores the best 'k' candidates.

This is specific to the monochromatic search case, where the query set is the reference set.

Parameters
queryIndexThe index of the query in question
referenceIndexThe index of the neighbor candidate in question
neighborsMatrix holding output neighbors.
distancesMatrix holding output distances.
template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::BaseCase ( const size_t  queryIndex,
const size_t  referenceIndex,
const arma::mat &  querySet,
arma::Mat< size_t > &  neighbors,
arma::mat &  distances 
) const
private

This is a helper function that computes the distance of the query to the neighbor candidates and appropriately stores the best 'k' candidates.

This is specific to bichromatic search, where the query set is not the same as the reference set.

Parameters
queryIndexThe index of the query in question
referenceIndexThe index of the neighbor candidate in question
querySetSet of query points.
neighborsMatrix holding output neighbors.
distancesMatrix holding output distances.
template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::BucketSize ( ) const
inline

Get the bucket size of the second hash.

Definition at line 179 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::bucketSize.

template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::BuildHash ( )
private

This function builds a hash table with two levels of hashing as presented in the paper.

This function first hashes the points with 'numProj' random projections to a single hash table creating (key, point ID) pairs where the key is a 'numProj'-dimensional integer vector.

Then each key in this hash table is hashed into a second hash table using a standard hash.

This function does not have any parameters and relies on parameters which are private members of this class, initialized during the class initialization.

template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::DistanceEvaluations ( ) const
inline

Return the number of distance evaluations performed.

Definition at line 160 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::distanceEvaluations.

template<typename SortPolicy = NearestNeighborSort>
size_t& mlpack::neighbor::LSHSearch< SortPolicy >::DistanceEvaluations ( )
inline

Modify the number of distance evaluations performed.

Definition at line 162 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::distanceEvaluations.

template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::InsertNeighbor ( arma::mat &  distances,
arma::Mat< size_t > &  neighbors,
const size_t  queryIndex,
const size_t  pos,
const size_t  neighbor,
const double  distance 
) const
private

This is a helper function that efficiently inserts better neighbor candidates into an existing set of neighbor candidates.

This function is only called by the 'BaseCase' function.

Parameters
distancesMatrix holding output distances.
neighborsMatrix holding output neighbors.
queryIndexThis is the index of the query being processed currently
posThe position of the neighbor candidate in the current list of neighbor candidates.
neighborThe neighbor candidate that is being inserted into the list of the best 'k' candidates for the query in question.
distanceThe distance of the query to the neighbor candidate.
template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::NumProjections ( ) const
inline

Get the number of projections.

Definition at line 168 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::projections.

template<typename SortPolicy = NearestNeighborSort>
const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::Offsets ( ) const
inline

Get the offsets 'b' for each of the projections. (One 'b' per column.)

Definition at line 173 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::offsets.

template<typename SortPolicy = NearestNeighborSort>
const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::Projection ( const size_t  i) const
inline

Get the projection matrix of the given table.

Definition at line 170 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::projections.

template<typename SortPolicy = NearestNeighborSort>
const arma::mat& mlpack::neighbor::LSHSearch< SortPolicy >::ReferenceSet ( ) const
inline

Return the reference dataset.

Definition at line 165 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::referenceSet.

template<typename SortPolicy = NearestNeighborSort>
template<typename VecType >
void mlpack::neighbor::LSHSearch< SortPolicy >::ReturnIndicesFromTable ( const VecType &  queryPoint,
arma::uvec &  referenceIndices,
size_t  numTablesToSearch 
) const
private

This function takes a query and hashes it into each of the hash tables to get keys for the query and then the key is hashed to a bucket of the second hash table and all the points (if any) in those buckets are collected as the potential neighbor candidates.

Parameters
queryPointThe query point currently being processed.
referenceIndicesThe list of neighbor candidates obtained from hashing the query into all the hash tables and eventually into multiple buckets of the second hash table.
template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::Search ( const arma::mat &  querySet,
const size_t  k,
arma::Mat< size_t > &  resultingNeighbors,
arma::mat &  distances,
const size_t  numTablesToSearch = 0 
)

Compute the nearest neighbors of the points in the given query set and store the output in the given matrices.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

Parameters
querySetSet of query points.
kNumber of neighbors to search for.
resultingNeighborsMatrix storing lists of neighbors for each query point.
distancesMatrix storing distances of neighbors for each query point.
numTablesToSearchThis parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered.
template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::Search ( const size_t  k,
arma::Mat< size_t > &  resultingNeighbors,
arma::mat &  distances,
const size_t  numTablesToSearch = 0 
)

Compute the nearest neighbors and store the output in the given matrices.

The matrices will be set to the size of n columns by k rows, where n is the number of points in the query dataset and k is the number of neighbors being searched for.

Parameters
kNumber of neighbors to search for.
resultingNeighborsMatrix storing lists of neighbors for each query point.
distancesMatrix storing distances of neighbors for each query point.
numTablesToSearchThis parameter allows the user to have control over the number of hash tables to be searched. This allows the user to pick the number of tables it can afford for the time available without having to build hashing for every table size. By default, this is set to zero in which case all tables are considered.
template<typename SortPolicy = NearestNeighborSort>
const arma::Mat<size_t>& mlpack::neighbor::LSHSearch< SortPolicy >::SecondHashTable ( ) const
inline

Get the second hash table.

Definition at line 182 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::secondHashTable.

template<typename SortPolicy = NearestNeighborSort>
const arma::vec& mlpack::neighbor::LSHSearch< SortPolicy >::SecondHashWeights ( ) const
inline

Get the weights of the second hash.

Definition at line 176 of file lsh_search.hpp.

References mlpack::neighbor::LSHSearch< SortPolicy >::secondHashWeights.

template<typename SortPolicy = NearestNeighborSort>
template<typename Archive >
void mlpack::neighbor::LSHSearch< SortPolicy >::Serialize ( Archive &  ar,
const unsigned  int 
)

Serialize the LSH model.

Parameters
arArchive to serialize to.
template<typename SortPolicy = NearestNeighborSort>
void mlpack::neighbor::LSHSearch< SortPolicy >::Train ( const arma::mat &  referenceSet,
const size_t  numProj,
const size_t  numTables,
const double  hashWidth = 0.0,
const size_t  secondHashSize = 99901,
const size_t  bucketSize = 500 
)

Train the LSH model on the given dataset.

This means building new hash tables.

Member Data Documentation

template<typename SortPolicy = NearestNeighborSort>
arma::Col<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::bucketContentSize
private

The number of elements present in each hash bucket; should be secondHashSize.

Definition at line 304 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
arma::Col<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::bucketRowInHashTable
private

For a particular hash value, points to the row in secondHashTable corresponding to this value.

Should be secondHashSize.

Definition at line 308 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::bucketSize
private

The bucket size of the second hash.

Definition at line 297 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::BucketSize().

template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::distanceEvaluations
private

The number of distance evaluations.

Definition at line 311 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::DistanceEvaluations().

template<typename SortPolicy = NearestNeighborSort>
double mlpack::neighbor::LSHSearch< SortPolicy >::hashWidth
private

The hash width.

Definition at line 288 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::numProj
private

The number of projections.

Definition at line 277 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::numTables
private

The number of hash tables.

Definition at line 279 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
arma::mat mlpack::neighbor::LSHSearch< SortPolicy >::offsets
private

The list of the offsets 'b' for each of the projection for each table.

Definition at line 285 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::Offsets().

template<typename SortPolicy = NearestNeighborSort>
bool mlpack::neighbor::LSHSearch< SortPolicy >::ownsSet
private

If true, we own the reference set.

Definition at line 274 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
std::vector<arma::mat> mlpack::neighbor::LSHSearch< SortPolicy >::projections
private

The std::vector containing the projection matrix of each table.

Definition at line 282 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::NumProjections(), and mlpack::neighbor::LSHSearch< SortPolicy >::Projection().

template<typename SortPolicy = NearestNeighborSort>
const arma::mat* mlpack::neighbor::LSHSearch< SortPolicy >::referenceSet
private

Reference dataset.

Definition at line 272 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::ReferenceSet().

template<typename SortPolicy = NearestNeighborSort>
size_t mlpack::neighbor::LSHSearch< SortPolicy >::secondHashSize
private

The big prime representing the size of the second hash.

Definition at line 291 of file lsh_search.hpp.

template<typename SortPolicy = NearestNeighborSort>
arma::Mat<size_t> mlpack::neighbor::LSHSearch< SortPolicy >::secondHashTable
private

The final hash table; should be (< secondHashSize) x bucketSize.

Definition at line 300 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::SecondHashTable().

template<typename SortPolicy = NearestNeighborSort>
arma::vec mlpack::neighbor::LSHSearch< SortPolicy >::secondHashWeights
private

The weights of the second hash.

Definition at line 294 of file lsh_search.hpp.

Referenced by mlpack::neighbor::LSHSearch< SortPolicy >::SecondHashWeights().


The documentation for this class was generated from the following file: