The projection pipeline
Every embedding model produces vectors in its own private space — OpenAI in 1536 dimensions, Cohere in 1024, open-source models in 384 or 768. None of these speak to each other. To compare them, the industry re-embeds everything from scratch every time a model changes.
82D takes a different approach. A single matrix multiply projects any embedding — regardless of its original dimension — into the shared 82-dimensional lingua franca. The originals stay untouched. The projection is a one-way read: it extracts geometric structure without modifying the source.
The question is whether that projection preserves the relationships that matter. It does. Here’s the proof.
Distance preservation
The projection uses orthogonal QR decomposition — a textbook method from linear algebra that guarantees the projection matrix has perfectly uncorrelated columns. In practice, this means the distance between any two points in the original space is faithfully preserved in 82D.
| Method | Preservation | Detail |
|---|---|---|
| Random Gaussian (Johnson–Lindenstrauss) | ~94% | Standard random projection, probabilistic guarantee |
| Orthogonal QR (82D) | 100% | 0 violations out of 124,750 distance pairs |
Tested on 500 real embeddings, 124,750 pairwise distances compared before and after projection. Every distance relationship survived intact.
Projection throughput
Because the projection is a single matrix multiply, it runs at the speed of linear algebra — not the speed of neural inference. There’s no model to load, no tokenizer to run, no attention layers to compute.
| Hardware | Throughput | Latency per vector |
|---|---|---|
| Single CPU core | 7M vec/sec | 0.14 µs |
| NVIDIA A100 | 45M vec/sec | 0.02 µs |
That’s 60,283× faster than running a sentence-transformer model to produce the same embedding. The projection step effectively disappears from any performance budget.
In production, the decoder folds directly into the projection matrix — raw input becomes 82D output in a single multiply, with no intermediate embedding step. On an A100 that reaches 287 million vectors per second: 21,594× faster than producing the same vectors through a neural model.
GPU search at scale
Once vectors are in 82D, they need to be searchable. NVIDIA’s cuVS CAGRA is the fastest approximate nearest-neighbor index available on GPU. Because 82D vectors are small, the entire index fits comfortably in GPU memory — and the search is nearly exact.
| Metric | Value | Detail |
|---|---|---|
| Index size | 1,000,000 × 82D | Tesla T4, 15 GB VRAM |
| GPU memory used | 0.33 GB | Leaves 14.67 GB free for more data |
| Search throughput | 110,317 QPS | Sustained, after warmup |
| Recall vs. brute-force | 99.96% | Compared against CPU ground truth |
| vs. CPU brute-force | 33,576× | CPU: 3.3 QPS → GPU: 110K QPS |
| Index build time | 12.7 seconds | One-time cost per million vectors |
The compact 82D representation is what makes this work. At 328 bytes per vector, a billion-row index would use roughly 305 GB — within reach of a multi-GPU setup. At full 1536D, the same billion rows would need 5.7 TB just for the vectors.
The bottleneck nobody talks about
Vector databases are fast. Pinecone, Milvus, Weaviate, Qdrant — they all search in under a microsecond. The bottleneck was never the search. It was the ETL: the neural model that turns raw data into vectors before the search can even begin.
A neural embedding model on an A100 produces about 13,000 vectors per second. The GPU search engine behind it can handle over a million queries per second. For every microsecond the GPU spends searching, the model spends 75 microseconds just getting the vector ready. The search engine sits idle for 99% of the pipeline, waiting for vectors to arrive.
| Pipeline | ETL step | Search step | Total / query |
|---|---|---|---|
| Industry standard | ~75 µs (neural embed, 13K/s) | ~0.74 µs (ANN search) | ~76 µs |
| 82D + cuVS | 0.0035 µs (LINGUA, 287M/s) | 0.74 µs (cuVS CAGRA) | ~0.74 µs |
| 82D standalone | 0.0035 µs (LINGUA, 287M/s) | 95 µs (brute-force, 100% recall) | ~95 µs |
All measured on A100 40GB with 1M records. The standalone option trades search speed for portability — no NVIDIA dependency, exact recall, and the ETL speedup still applies on any hardware that does matrix math.
Try it
Free tier. No credit card. Project your first vectors in 30 seconds.