Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

About

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

Yash Kankanampati, Yuxuan Zong, Nadi Tomeh, Benjamin Piwowarski, Joseph Le Roux• 2026

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR (test)--
90
Information RetrievalTREC DL 19
nDCG@1073.7
61
Information RetrievalTREC DL20
NDCG@1073.1
50
Information RetrievalMS MARCO (dev)
MRR@1039.5
15
Showing 4 of 4 rows

Other info

Follow for update