SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

About

We present SwiftEmbed, a production-oriented serving system for static token embeddings that achieves 1.12\,ms p50 latency for single-text requests while maintaining a 60.6 MTEB average score across 8 representative tasks. Built around the open-source Potion-base-8M distilled model from MinishLab and implemented in Rust, the system delivers 50,000 requests per second through static embedding lookup, mean pooling, and zero-copy IEEE754 binary serialization. Evaluation demonstrates exceptional duplicate detection performance (90.1% AP) and strong semantic similarity (76.1% Spearman correlation). Performance relative to Sentence-BERT is task-dependent: robust for deduplication and similarity workloads (89--100%), substantially lower for classification and complex retrieval tasks (75%). Domain-specific performance ranges from 75% to 131% of a GloVe-840B baseline. The system targets real-time embedding applications where sub-5\,ms latency is operationally critical and where full transformer inference is not feasible.

Edouard Lansiaux, Antoine Simonet, Eric Wiel• 2025

Related benchmarks

Task	Dataset	Result
Throughput Scalability	General Inference Workload	RPS5.00e+4	15
Massive Text Embedding Evaluation	MTEB 8 representative tasks including Banking77, SprintDuplicateQuestions, TwitterSemEval, ArguAna (test)	Classification Acc0.589	8
Text Embedding	MTEB	MTEB Quality Score60.6	8
Duplicate Detection	Downstream Task Evaluation	Score1	2
Classification	Downstream Task Evaluation	Score50	2
Clustering	Downstream Task Evaluation	Score0.627	2
Semantic Search	Downstream Task Evaluation	Score87.7	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord