Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression

About

Vector quantization is a fundamental tool for compressing high-dimensional embeddings, yet existing multi-codebook methods rely on static codebooks that limit expressiveness under heterogeneous data geometry. While recent dynamic quantizers like QINCo adapt codebooks to individual inputs and improve expressiveness, their strict sequential dependencies create decoding bottlenecks. We propose Residual Quantization via Mixture of Experts (RQ-MoE), a framework combining a two-level MoE with dual-stream quantization to enable input-dependent codebook adaptation for efficient vector quantization. RQ-MoE enables dynamic codebook construction and decouples instruction from quantization, facilitating parallel decoding. Theoretically, we show that standard Residual Quantization and QINCo can be recovered as constrained special cases of RQ-MoE, and derive a guideline for setting expert dimensionality in RQ-MoE. Extensive experiments show that RQ-MoE achieves state-of-the-art or on-par performance in reconstruction and retrieval, while providing 6x-14x faster decoding than prior vector quantization methods. The implementation is available at https://github.com/KDEGroup/RQ-MoE.

Zhengjia Zhong, Shuyan Ke, Zaizhou Lin, Jiaqi Song, Hongyi Lan, Hui Li• 2026

Related benchmarks

TaskDatasetResultRank
Vector QuantizationBigANN1M D = 128 (test)
MSE0.3
24
Vector QuantizationDeep1M D = 96 (test)
MSE0.05
24
Vector QuantizationFB-ssnpp1M D = 256 (test)
MSE6.53
20
Vector QuantizationContriever1M D = 768 (test)
MSE1.08
20
Vector QuantizationBigANN 1M (test)
MSE0.3
12
Vector QuantizationDeep 1M (test)
MSE0.05
12
Vector QuantizationFB-ssnpp 1M (test)
MSE6.53
10
Vector QuantizationContriever 1M (test)
MSE1.08
10
Vector QuantizationBigANN 1M
MSE1.1
4
Vector QuantizationFB-ssnpp1M
MSE8.33
4
Showing 10 of 12 rows

Other info

Follow for update