C++ API Documentation

template<typename dist_t, typename label_t>
class Index

Public Functions

inline Index(std::unique_ptr<DistanceInterface<dist_t>> dist, int dataset_size, int max_edges_per_node, bool collect_stats = false)

Construct a new Index object for approximate near neighbor search.

This constructor initializes an Index object with the specified distance metric, dataset size, and maximum number of links per node. It also allows for collecting statistics during the search process.

Parameters:
  • dist – The distance metric for the index. Options include l2 (euclidean) and inner product.

  • dataset_size – The maximum number of vectors that can be inserted in the index.

  • max_edges_per_node – The maximum number of links per node.

  • collect_stats – Flag indicating whether to collect statistics during the search process.

inline ~Index()
inline void buildGraphLinks(const std::string &mtx_filename)
inline std::vector<std::vector<uint32_t>> getGraphOutdegreeTable()
inline void allocateNode(void *data, label_t &label, node_id_t &new_node_id)

Store the new node in the global data structure. In a multi-threaded setting, the index data guard should be held by the caller with an exclusive lock.

Parameters:
  • data – The vector to add.

  • label – The label (meta-data) of the vector.

  • new_node_id – The id of the new node.

template<typename data_type>
inline void addBatch(void *data, std::vector<label_t> &labels, int ef_construction, int num_initializations = 100)

Adds vectors to the index in batches.

This method is responsible for adding vectors in batches, represented by data, to the underlying graph. Each vector is associated with a label provided in the labels vector. The method efficiently handles concurrent additions by dividing the workload among multiple threads, defined by _num_threads.

The method ensures thread safety by employing locking mechanisms at the node level in the underlying connectNeighbors and beamSearch methods. This allows multiple threads to safely add vectors to the index without causing data races or inconsistencies in the graph structure.

Parameters:
  • data – Pointer to the array of vectors to be added.

  • labels – A vector of labels corresponding to each vector in data.

  • ef_construction – Parameter for controlling the size of the dynamic candidate list during the construction of the graph.

  • num_initializations – Number of initializations for the search algorithm. Must be greater than 0.

Throws:
  • std::invalid_argument – Thrown if `num_initializations` is less than or equal to 0.

  • std::runtime_error – Thrown if the maximum number of nodes in the index is reached.

inline void add(void *data, label_t &label, int ef_construction, int num_initializations)

Adds a single vector to the index.

This method is called internally by addBatch for each vector in the batch. The method ensures thread safety by using locking primitives, allowing it to be safely used in a multi-threaded environment.

The method first checks if the current number of nodes has reached the maximum capacity. If so, it throws a runtime error. It then locks the index structure to prevent concurrent modifications while allocating a new node. After unlocking, it connects the new node to its neighbors in the graph.

Parameters:
  • data – Pointer to the vector data being added.

  • label – Label associated with the vector.

  • ef_construction – Parameter controlling the size of the dynamic candidate list during the construction of the graph.

  • num_initializations – Number of initializations for the search algorithm.

Throws:

std::runtime_error – Thrown if the maximum number of nodes is reached.

inline std::vector<dist_label_t> search(const void *query, const int K, int ef_search, int num_initializations = 100)
inline void doGraphReordering(const std::vector<std::string> &reordering_methods)
inline void reorderGOrder(const int window_size = 5)
inline void reorderRCM()
inline void saveIndex(const std::string &filename)
inline void setNumThreads(uint32_t num_threads)
inline uint64_t getTotalIndexMemory() const
inline uint64_t mutexesAllocatedMemory() const
inline uint64_t visitedSetPoolAllocatedMemory() const
inline uint32_t getNumThreads() const
inline size_t maxEdgesPerNode() const
inline size_t dataSizeBytes() const
inline size_t nodeSizeBytes() const
inline size_t maxNodeCount() const
inline size_t currentNumNodes() const
inline size_t dataDimension() const
inline uint64_t distanceComputations() const
inline void resetStats()
inline void getIndexSummary() const

Public Static Functions

static inline std::unique_ptr<Index<dist_t, label_t>> loadIndex(const std::string &filename)

Friends

friend class cereal::access
template<typename T>
class DistanceInterface

Subclassed by flatnav::distances::InnerProductDistance< data_type >, flatnav::distances::SquaredL2Distance< data_type >

Public Functions

inline float distance(const void *x, const void *y, bool asymmetric = false)
inline size_t dimension()
inline size_t dataSize()
inline void getSummary()
inline void transformData(void *destination, const void *src)
template<typename Archive>
inline void serialize(Archive &archive)
template<DataType data_type = DataType::float32>
class SquaredL2Distance : public flatnav::distances::DistanceInterface<SquaredL2Distance<DataType::float32>>

Public Functions

SquaredL2Distance() = default
inline SquaredL2Distance(size_t dim)
inline constexpr size_t getDimension() const
inline constexpr float distanceImpl(const void *x, const void *y, bool asymmetric = false) const

Public Static Functions

static inline std::unique_ptr<SquaredL2Distance<data_type>> create(size_t dim)

Friends

friend class ::cereal::access
template<DataType data_type = DataType::float32>
class InnerProductDistance : public flatnav::distances::DistanceInterface<InnerProductDistance<DataType::float32>>

Public Functions

InnerProductDistance() = default
inline InnerProductDistance(size_t dim)
inline constexpr float distanceImpl(const void *x, const void *y, bool asymmetric = false) const

Public Static Functions

static inline std::unique_ptr<InnerProductDistance<data_type>> create(size_t dim)

Friends

friend class cereal::access
class VisitedSet

Public Functions

inline VisitedSet(const uint32_t size)
inline void prefetch(const uint32_t num) const
inline uint8_t getMark() const
inline void insert(const uint32_t num)
inline uint32_t size() const
inline void clear()
inline bool isVisited(const uint32_t num) const
inline ~VisitedSet()
inline VisitedSet(const VisitedSet &other)
inline VisitedSet(VisitedSet &&other) noexcept
inline VisitedSet &operator=(const VisitedSet &other)
inline VisitedSet &operator=(VisitedSet &&other) noexcept
class VisitedSetPool

Manages a pool of VisitedSet objects in a thread-safe manner.

This class is designed to efficiently provide and manage a pool of VisitedSet instances for concurrent use in multi-threaded environments. It ensures that each visited set can be used by only one thread at a time without the risk of concurrent access and modification.

The class preallocates a specified number of VisitedSet objects to eliminate the overhead of dynamic allocation during runtime. It uses a mutex to synchronize access to the visisted set pool, ensuring that only one thread can modify the pool at any given time. This mechanism provides both thread safety and improved performance by reusing visited_set objects instead of continuously creating and destroying them.

When a thread requires a VisitedSet, it can call pollAvailablevisited_set() to retrieve an available visited_set from the pool. If the pool is empty, the function will dynamically allocate a new visited_set to ensure that the requesting thread can proceed with its task. Once the thread has finished using the visited_set, it should return it to the pool by calling pushvisited_set().

Usage example:

VisitedSetPool visited_pool(10, 1000);
VisitedSet* visited_set = visited_set_pool.pollAvailableSet();
// Use the visited_set in a thread...
visited_set_pool.pushVisitedSet(visited_set);

Note

The class assumes that all threads will properly return the visited_sets to the pool after use. Failing to return a visited_set will deplete the pool and lead to dynamic allocation, negating the performance benefits.

Param initial_pool_size:

The number of visited_set objects to initially create and store in the pool.

Param num_elements:

The size of each VisitedSet, which typically corresponds to the number of nodes or elements that each visited_set is expected to manage.

Public Functions

inline VisitedSetPool(uint32_t initial_pool_size, uint32_t num_elements, uint32_t max_pool_size = std::thread::hardware_concurrency())
inline VisitedSet *pollAvailableSet()
inline size_t poolSize() const
inline void pushVisitedSet(VisitedSet *visited_set)
inline void setPoolSize(uint32_t new_pool_size)
inline uint32_t getPoolSize()
inline ~VisitedSetPool()

Warning

doxygenclass: Cannot find class “flatnav::util::DataType” in doxygen xml output for project “FlatNav” from directory: ./doxygen_output/xml