The author discusses their experience in optimizing a Rust implementation of the RaBitQ algorithm, which is an approximate nearest neighbor search algorithm. They initially used the nalgebra library but found it to be slower than expected. They then utilized profiling tools like samply and the Firefox Profiler to identify bottlenecks in the code. The author found that f32::clone() calls were taking up a significant amount of time and decided to preallocate memory for vectors and reuse it in the iteration. They also implemented the binarize_vector function with AVX2, which improved the QPS by 32% for GIST. The author also used scalar quantization to eliminate more f32::clone() in the code and replaced some nalgebra functions with manual implementations. They also found another algebra crate called faer, which is optimized with SIMD and provides better row/column iteration performance. The author also re-implemented the binary dot product with AVX2, which improved the QPS by 11% for GIST.
dev.to
dev.to
