Linear-Time and Constant-Memory Text Embeddings Based on Recurrent Language Models

About

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alternative, introducing a vertically chunked inference strategy that enables fast embedding generation with memory usage that becomes constant in the input length once it exceeds the vertical chunk size. By fine-tuning Mamba2 models, we demonstrate their viability as general-purpose text embedders, achieving competitive performance across a range of benchmarks while maintaining a substantially smaller memory footprint compared to transformer-based counterparts. We empirically validate the applicability of our inference strategy to Mamba2, RWKV, and xLSTM models, confirming consistent runtime-memory trade-offs across architectures and establishing recurrent models as a compelling alternative to transformers for efficient embedding generation.

Tobias Grantner, Emanuel Sallinger, Martin Flechl• 2026

Related benchmarks

Task	Dataset	Result
Text Embedding Evaluation	MTEB eng v2 (test)	Average Score65.2	26
Text Embedding	MTEB Multilingual V2 (test)	Mean Score (TaskType)51.9	16
Long document retrieval	LongEmbed (test)	Mean over Task44.5	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord