Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WARP: An Efficient Engine for Multi-Vector Retrieval

About

Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we present WARP, a retrieval engine that substantially improves the efficiency of retrievers trained with the XTR objective through three key innovations: (1) WARP$_\text{SELECT}$ for dynamic similarity imputation; (2) implicit decompression, avoiding costly vector reconstruction during retrieval; and (3) a two-stage reduction process for efficient score aggregation. Combined with highly-optimized C++ kernels, our system reduces end-to-end latency compared to XTR's reference implementation by 41x, and achieves a 3x speedup over the ColBERTv2/PLAID engine, while preserving retrieval quality.

Jan Luca Scheerer, Matei Zaharia, Christopher Potts, Gustavo Alonso, Omar Khattab• 2025

Related benchmarks

TaskDatasetResultRank
RetrievalMS MARCO V1
Retrieval Latency (ms)72.8
57
Information RetrievalLoTTE pooled (test)
Retrieval Time (ms)39
41
Showing 2 of 2 rows

Other info

Follow for update