C++ API Documentation

template<typename dist_t, typename label_t> class Index

Public Functions

inline Index(std::unique_ptr<DistanceInterface<dist_t>> dist, int dataset_size, int max_edges_per_node, bool collect_stats = false, DataType data_type = DataType::float32)

Construct a new Index object for approximate near neighbor search.

This constructor initializes an Index object with the specified distance metric, dataset size, and maximum number of links per node. It also allows for collecting statistics during the search process.

Parameters:

dist – The distance metric for the index. Options include l2 (euclidean) and inner product.
dataset_size – The maximum number of vectors that can be inserted in the index.
max_edges_per_node – The maximum number of links per node.
collect_stats – Flag indicating whether to collect statistics during the search process.

inline ~Index()

inline void buildGraphLinks(const std::string &mtx_filename)

inline std::vector<std::vector<uint32_t>> getGraphOutdegreeTable()

inline void allocateNode(void *data, label_t &label, node_id_t &new_node_id)

Store the new node in the global data structure. In a multi-threaded setting, the index data guard should be held by the caller with an exclusive lock.

Parameters:

data – The vector to add.
label – The label (meta-data) of the vector.
new_node_id – The id of the new node.

template<typename data_type> inline void addBatch(void *data, std::vector<label_t> &labels, int ef_construction, int num_initializations = 100)

Adds vectors to the index in batches.

This method is responsible for adding vectors in batches, represented by data, to the underlying graph. Each vector is associated with a label provided in the labels vector. The method efficiently handles concurrent additions by dividing the workload among multiple threads, defined by _num_threads.

The method ensures thread safety by employing locking mechanisms at the node level in the underlying connectNeighbors and beamSearch methods. This allows multiple threads to safely add vectors to the index without causing data races or inconsistencies in the graph structure.

Parameters:

data – Pointer to the array of vectors to be added.
labels – A vector of labels corresponding to each vector in data.
ef_construction – Parameter for controlling the size of the dynamic candidate list during the construction of the graph.
num_initializations – Number of initializations for the search algorithm. Must be greater than 0.

Throws:

std::invalid_argument – Thrown if `num_initializations` is less than or equal to 0.
std::runtime_error – Thrown if the maximum number of nodes in the index is reached.

inline void add(void *data, label_t &label, int ef_construction, int num_initializations)

Adds a single vector to the index.

This method is called internally by addBatch for each vector in the batch. The method ensures thread safety by using locking primitives, allowing it to be safely used in a multi-threaded environment.

The method first checks if the current number of nodes has reached the maximum capacity. If so, it throws a runtime error. It then locks the index structure to prevent concurrent modifications while allocating a new node. After unlocking, it connects the new node to its neighbors in the graph.

Parameters:

data – Pointer to the vector data being added.
label – Label associated with the vector.
ef_construction – Parameter controlling the size of the dynamic candidate list during the construction of the graph.
num_initializations – Number of initializations for the search algorithm.

Throws:

std::runtime_error – Thrown if the maximum number of nodes is reached.

inline std::vector<dist_label_t> search(const void *query, const int K, int ef_search, int num_initializations = 100)

inline void doGraphReordering(const std::vector<std::string> &reordering_methods)

inline void reorderGOrder(const int window_size = 5)

inline void reorderRCM()

inline void saveIndex(const std::string &filename)

inline void setNumThreads(uint32_t num_threads)

inline uint64_t getTotalIndexMemory() const

inline uint64_t mutexesAllocatedMemory() const

inline uint64_t visitedSetPoolAllocatedMemory() const

inline uint32_t getNumThreads() const

inline size_t maxEdgesPerNode() const

inline size_t dataSizeBytes() const

inline size_t nodeSizeBytes() const

inline size_t maxNodeCount() const

inline size_t currentNumNodes() const

inline size_t dataDimension() const

inline uint64_t distanceComputations() const

inline DataType getDataType() const

inline void resetStats()

inline void getIndexSummary() const

Public Static Functions

static inline std::unique_ptr<Index<dist_t, label_t>> loadIndex(const std::string &filename)

Friends

friend class cereal::access

template<typename T> class DistanceInterface

Subclassed by flatnav::distances::InnerProductDistance< data_type >, flatnav::distances::SquaredL2Distance< data_type >

Public Functions

inline float distance(const void *x, const void *y, bool asymmetric = false)

inline size_t dimension()

inline size_t dataSize()

inline void getSummary()

inline DataType getDataType()

inline void transformData(void *destination, const void *src)

template<typename Archive> inline void serialize(Archive &archive)

template<DataType data_type = DataType::float32> class SquaredL2Distance : public flatnav::distances::DistanceInterface<SquaredL2Distance<DataType::float32>>

Public Functions

SquaredL2Distance() = default

inline SquaredL2Distance(size_t dim)

inline constexpr size_t getDimension() const

inline constexpr float distanceImpl(const void *x, const void *y, bool asymmetric = false) const

inline DataType getDataTypeImpl() const

Public Static Functions

static inline std::unique_ptr<SquaredL2Distance<data_type>> create(size_t dim)

Friends

friend class ::cereal::access

template<DataType data_type = DataType::float32> class InnerProductDistance : public flatnav::distances::DistanceInterface<InnerProductDistance<DataType::float32>>

Public Functions

InnerProductDistance() = default

inline InnerProductDistance(size_t dim)

inline constexpr float distanceImpl(const void *x, const void *y, bool asymmetric = false) const

inline DataType getDataTypeImpl() const

Public Static Functions

static inline std::unique_ptr<InnerProductDistance<data_type>> create(size_t dim)

Friends

friend class cereal::access

class VisitedSet

Public Functions

inline VisitedSet(const uint32_t size)

inline void prefetch(const uint32_t num) const

inline uint8_t getMark() const

inline void insert(const uint32_t num)

inline uint32_t size() const

inline void clear()

inline bool isVisited(const uint32_t num) const

inline ~VisitedSet()

inline VisitedSet(const VisitedSet &other)

inline VisitedSet(VisitedSet &&other) noexcept

inline VisitedSet &operator=(const VisitedSet &other)

inline VisitedSet &operator=(VisitedSet &&other) noexcept

class VisitedSetPool

Manages a pool of VisitedSet objects in a thread-safe manner.

This class is designed to efficiently provide and manage a pool of VisitedSet instances for concurrent use in multi-threaded environments. It ensures that each visited set can be used by only one thread at a time without the risk of concurrent access and modification.

The class preallocates a specified number of VisitedSet objects to eliminate the overhead of dynamic allocation during runtime. It uses a mutex to synchronize access to the visisted set pool, ensuring that only one thread can modify the pool at any given time. This mechanism provides both thread safety and improved performance by reusing visited_set objects instead of continuously creating and destroying them.

When a thread requires a VisitedSet, it can call pollAvailablevisited_set() to retrieve an available visited_set from the pool. If the pool is empty, the function will dynamically allocate a new visited_set to ensure that the requesting thread can proceed with its task. Once the thread has finished using the visited_set, it should return it to the pool by calling pushvisited_set().

Usage example:

VisitedSetPool visited_pool(10, 1000);
VisitedSet* visited_set = visited_set_pool.pollAvailableSet();
// Use the visited_set in a thread...
visited_set_pool.pushVisitedSet(visited_set);

Note

The class assumes that all threads will properly return the visited_sets to the pool after use. Failing to return a visited_set will deplete the pool and lead to dynamic allocation, negating the performance benefits.

Param initial_pool_size:: The number of visited_set objects to initially create and store in the pool.
Param num_elements:: The size of each VisitedSet, which typically corresponds to the number of nodes or elements that each visited_set is expected to manage.

Public Functions

inline VisitedSetPool(uint32_t initial_pool_size, uint32_t num_elements, uint32_t max_pool_size = std::thread::hardware_concurrency())

inline VisitedSet *pollAvailableSet()

inline size_t poolSize() const

inline void pushVisitedSet(VisitedSet *visited_set)

inline void setPoolSize(uint32_t new_pool_size)

inline uint32_t getPoolSize()

inline ~VisitedSetPool()

Warning

doxygenclass: Cannot find class “flatnav::util::DataType” in doxygen xml output for project “FlatNav” from directory: ./doxygen_output/xml