RyanCodrai/turbovec: a quantized vector index that fits 10M vectors in 4 GB

The memory math that sells it

turbovec opens with a concrete claim, and it is the right one to lead with: a 10 million document corpus that needs 31 GB of RAM as float32 fits in 4 GB with turbovec, and searches faster than FAISS. That is the whole pitch in one line. It is a Rust vector index with Python bindings, built on Google Research’s TurboQuant, a data-oblivious quantizer that matches the Shannon lower bound on distortion with no codebook training and no separate train phase.

The no-train property is the part that changes how you use it. Most quantized indexes make you collect representative vectors, run a training pass, and rebuild as the corpus shifts. turbovec ingests online: you add vectors and they are indexed, with no train step, no parameter tuning, and no rebuilds as the corpus grows. For a RAG store that keeps accreting documents, that removes a recurring operational chore.

What it does well

Speed from hand-written SIMD. NEON kernels on ARM and AVX-512BW on x86 beat FAISS IndexPQFastScan by 12 to 20 percent on ARM and match or beat it on x86, per the README’s benchmarks.
Filtering without a recall penalty. Pass an id allowlist or a slot bitmask to search() and the kernel honors it directly, returning up to k results from the allowed set with no over-fetching, which is where many indexes quietly lose recall on selective filters.
Fully local. No managed service and no data leaving your machine or VPC, so you can pair it with an open-source embedding model for an air-gapped RAG stack.
Stable ids. IdMapIndex keeps your external uint64 ids stable across deletes.

Install

pip install turbovec

A minimal index, add, search, and persist:

from turbovec import TurboQuantIndex

index = TurboQuantIndex(dim=1536, bit_width=4)
index.add(vectors)
scores, indices = index.search(query, k=10)
index.write("my_index.tv")
loaded = TurboQuantIndex.load("my_index.tv")

If you need ids that survive deletes, use IdMapIndex with add_with_ids and your own uint64 ids.

The tradeoff to go in knowing

Quantization is lossy by definition, and turbovec is honest about where that shows. A discussed issue documents that the LUT scoring kernel loses up to 1.4 percentage points of recall@1 versus exact math on low-dimensional 4-bit configurations. That is small and expected, but it is the right thing to measure for your own data: at very low bit widths and low dimensions the approximation is tighter on memory and looser on recall. If your application is recall-critical at low dimensions, test the bit width rather than assuming the default is free. The project is young, with 8 open issues as of 2026-06 and no tagged releases yet, so treat the API as still settling.

turbovec versus other vector indexes

	turbovec	FAISS	USearch
Stars	10,565	40,250	4,152
Core	Rust, TurboQuant	C++, many index types	C++, HNSW
Train step	none, online ingest	required for PQ indexes	none
License	MIT	MIT	Apache-2.0

Counts are from GitHub as of June 2026. FAISS is the incumbent, vastly broader in index types and battle-tested, but its product-quantization indexes need a training pass and it is a heavier dependency. USearch is a lean HNSW-based engine, also train-free, optimizing a different point on the speed-memory curve. turbovec’s specific bet is extreme memory compression with no train step and SIMD search that competes with FAISS on speed, which is a narrow but valuable niche for memory-bound local RAG.

When the memory win is decisive

The 31 GB to 4 GB figure is not just a benchmark flex, it changes what runs where. An index that fits in a few gigabytes runs on a laptop, a small VM, or alongside other services on one box, instead of demanding a dedicated high-memory machine. Combined with online ingest and no rebuilds, that makes turbovec a fit for embedded and on-device RAG, where you cannot ship a 31 GB working set and cannot call out to a hosted vector service. That is the niche it owns: local, memory-bound, privacy-bound retrieval, served fast enough that the compression does not cost you latency.

turbovec pairs naturally with a local embedding model served by Ollama for a fully local RAG stack, and with a compression layer like headroom to keep retrieved chunks small. For what else is climbing, see LLM tooling, the daily digest, and the weekly report.

FAQ

Does turbovec need a training step? No. It ingests vectors online and indexes them as you add, with no train phase, tuning, or rebuilds as the corpus grows.

How much memory does it save? The README cites a 10 million document corpus dropping from 31 GB as float32 to 4 GB, while searching faster than FAISS.

What is the catch? Quantization costs some recall. A documented issue shows up to 1.4 percentage points of recall@1 lost at low-dimensional 4-bit settings, so measure bit width against your accuracy needs.

Is it local? Yes. No managed service and no data leaves your machine, which suits air-gapped RAG with an open-source embedding model.

RyanCodrai/turbovec: a quantized vector index that fits 10M vectors in 4 GB

Star growth

The memory math that sells it

What it does well

Install

The tradeoff to go in knowing

turbovec versus other vector indexes

When the memory win is decisive

FAQ

Repository data

RyanCodrai/turbovec: a quantized vector index that fits 10M vectors in 4 GB

Star growth

The memory math that sells it

What it does well

Install

The tradeoff to go in knowing

turbovec versus other vector indexes

When the memory win is decisive

Related

FAQ

Repository data