C++ API Documentation
Public Functions
Construct a new Index object for approximate near neighbor search.
This constructor initializes an Index object with the specified distance metric, dataset size, and maximum number of links per node. It also allows for collecting statistics during the search process.
- Parameters:
dist – The distance metric for the index. Options include l2 (euclidean) and inner product.
dataset_size – The maximum number of vectors that can be inserted in the index.
max_edges_per_node – The maximum number of links per node.
collect_stats – Flag indicating whether to collect statistics during the search process.
Store the new node in the global data structure. In a multi-threaded setting, the index data guard should be held by the caller with an exclusive lock.
- Parameters:
data – The vector to add.
label – The label (meta-data) of the vector.
new_node_id – The id of the new node.
Adds vectors to the index in batches.
This method is responsible for adding vectors in batches, represented by
data
, to the underlying graph. Each vector is associated with a label provided in thelabels
vector. The method efficiently handles concurrent additions by dividing the workload among multiple threads, defined by_num_threads
.The method ensures thread safety by employing locking mechanisms at the node level in the underlying
connectNeighbors
andbeamSearch
methods. This allows multiple threads to safely add vectors to the index without causing data races or inconsistencies in the graph structure.- Parameters:
data – Pointer to the array of vectors to be added.
labels – A vector of labels corresponding to each vector in
data
.ef_construction – Parameter for controlling the size of the dynamic candidate list during the construction of the graph.
num_initializations – Number of initializations for the search algorithm. Must be greater than 0.
- Throws:
std::invalid_argument – Thrown if `num_initializations` is less than or equal to 0.
std::runtime_error – Thrown if the maximum number of nodes in the index is reached.
Adds a single vector to the index.
This method is called internally by
addBatch
for each vector in the batch. The method ensures thread safety by using locking primitives, allowing it to be safely used in a multi-threaded environment.The method first checks if the current number of nodes has reached the maximum capacity. If so, it throws a runtime error. It then locks the index structure to prevent concurrent modifications while allocating a new node. After unlocking, it connects the new node to its neighbors in the graph.
- Parameters:
data – Pointer to the vector data being added.
label – Label associated with the vector.
ef_construction – Parameter controlling the size of the dynamic candidate list during the construction of the graph.
num_initializations – Number of initializations for the search algorithm.
- Throws:
std::runtime_error – Thrown if the maximum number of nodes is reached.
Public Static Functions
Friends
- friend class cereal::access
Subclassed by flatnav::distances::InnerProductDistance< data_type >, flatnav::distances::SquaredL2Distance< data_type >
Public Functions
Public Functions
Public Static Functions
Friends
- friend class ::cereal::access
Public Functions
Public Static Functions
Friends
- friend class cereal::access
Public Functions
Manages a pool of VisitedSet objects in a thread-safe manner.
This class is designed to efficiently provide and manage a pool of VisitedSet instances for concurrent use in multi-threaded environments. It ensures that each visited set can be used by only one thread at a time without the risk of concurrent access and modification.
The class preallocates a specified number of VisitedSet objects to eliminate the overhead of dynamic allocation during runtime. It uses a mutex to synchronize access to the visisted set pool, ensuring that only one thread can modify the pool at any given time. This mechanism provides both thread safety and improved performance by reusing visited_set objects instead of continuously creating and destroying them.
When a thread requires a VisitedSet, it can call
pollAvailablevisited_set()
to retrieve an available visited_set from the pool. If the pool is empty, the function will dynamically allocate a new visited_set to ensure that the requesting thread can proceed with its task. Once the thread has finished using the visited_set, it should return it to the pool by callingpushvisited_set()
.Usage example:
VisitedSetPool visited_pool(10, 1000); VisitedSet* visited_set = visited_set_pool.pollAvailableSet(); // Use the visited_set in a thread... visited_set_pool.pushVisitedSet(visited_set);
Note
The class assumes that all threads will properly return the visited_sets to the pool after use. Failing to return a visited_set will deplete the pool and lead to dynamic allocation, negating the performance benefits.
- Param initial_pool_size:
The number of visited_set objects to initially create and store in the pool.
- Param num_elements:
The size of each VisitedSet, which typically corresponds to the number of nodes or elements that each visited_set is expected to manage.
Public Functions
Warning
doxygenclass: Cannot find class “flatnav::util::DataType” in doxygen xml output for project “FlatNav” from directory: ./doxygen_output/xml