The Projection Pipeline

The projection pipeline

Every embedding model produces vectors in its own private space — OpenAI in 1536 dimensions, Cohere in 1024, open-source models in 384 or 768. None of these speak to each other. To compare them, the industry re-embeds everything from scratch every time a model changes.

82D takes a different approach. A single matrix multiply projects any embedding — regardless of its original dimension — into the shared 82-dimensional lingua franca. The originals stay untouched. The projection is a one-way read: it extracts geometric structure without modifying the source.

The question is whether that projection preserves the relationships that matter. It does. Here’s the proof.

Distance preservation

The projection uses orthogonal QR decomposition — a textbook method from linear algebra that guarantees the projection matrix has perfectly uncorrelated columns. In practice, this means the distance between any two points in the original space is faithfully preserved in 82D.

Method	Preservation	Detail
Random Gaussian (Johnson–Lindenstrauss)	~94%	Standard random projection, probabilistic guarantee
Orthogonal QR (82D)	100%	0 violations out of 124,750 distance pairs

18.7× compression (1536D → 82D) | 328 bytes per vector | ~1.0 Pearson correlation

Tested on 500 real embeddings, 124,750 pairwise distances compared before and after projection. Every distance relationship survived intact.

Projection throughput

Because the projection is a single matrix multiply, it runs at the speed of linear algebra — not the speed of neural inference. There’s no model to load, no tokenizer to run, no attention layers to compute.

Hardware	Throughput	Latency per vector
Single CPU core	7M vec/sec	0.14 µs
NVIDIA A100	45M vec/sec	0.02 µs

That’s 60,283× faster than running a sentence-transformer model to produce the same embedding. The projection step effectively disappears from any performance budget.

In production, the decoder folds directly into the projection matrix — raw input becomes 82D output in a single multiply, with no intermediate embedding step. On an A100 that reaches 287 million vectors per second: 21,594× faster than producing the same vectors through a neural model.

GPU search at scale

Once vectors are in 82D, they need to be searchable. NVIDIA’s cuVS CAGRA is the fastest approximate nearest-neighbor index available on GPU. Because 82D vectors are small, the entire index fits comfortably in GPU memory — and the search is nearly exact.

Metric	Value	Detail
Index size	1,000,000 × 82D	Tesla T4, 15 GB VRAM
GPU memory used	0.33 GB	Leaves 14.67 GB free for more data
Search throughput	110,317 QPS	Sustained, after warmup
Recall vs. brute-force	99.96%	Compared against CPU ground truth
vs. CPU brute-force	33,576×	CPU: 3.3 QPS → GPU: 110K QPS
Index build time	12.7 seconds	One-time cost per million vectors

The compact 82D representation is what makes this work. At 328 bytes per vector, a billion-row index would use roughly 305 GB — within reach of a multi-GPU setup. At full 1536D, the same billion rows would need 5.7 TB just for the vectors.

The bottleneck nobody talks about

Vector databases are fast. Pinecone, Milvus, Weaviate, Qdrant — they all search in under a microsecond. The bottleneck was never the search. It was the ETL: the neural model that turns raw data into vectors before the search can even begin.

A neural embedding model on an A100 produces about 13,000 vectors per second. The GPU search engine behind it can handle over a million queries per second. For every microsecond the GPU spends searching, the model spends 75 microseconds just getting the vector ready. The search engine sits idle for 99% of the pipeline, waiting for vectors to arrive.

Pipeline	ETL step	Search step	Total / query
Industry standard	~75 µs (neural embed, 13K/s)	~0.74 µs (ANN search)	~76 µs
82D + cuVS	0.0035 µs (LINGUA, 287M/s)	0.74 µs (cuVS CAGRA)	~0.74 µs
82D standalone	0.0035 µs (LINGUA, 287M/s)	95 µs (brute-force, 100% recall)	~95 µs

102× faster end-to-end | 21,594× faster ETL | ~1.35M queries/sec

All measured on A100 40GB with 1M records. The standalone option trades search speed for portability — no NVIDIA dependency, exact recall, and the ETL speedup still applies on any hardware that does matrix math.

287M vectors/sec (A100)

100% distance preservation

102× faster pipeline

1.35M queries/sec

18.7× compression

Try it

Free tier. No credit card. Project your first vectors in 30 seconds.

Start projecting Read the research

slarty