Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Look in the Middle: Structural Anchor Pruning for Scalable Visual RAG Indexing

About

Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive index vector size overheads. Training-free pruning solutions (e.g., EOS-attention based methods) can reduce index vector size by approximately 60% without model adaptation, but often underperform random selection in high-compression scenarios (> 80%). Prior research (e.g., Light-ColPali) attributes this to the conclusion that visual token importance is inherently query-dependent, thereby questioning the feasibility of training-free pruning. In this work, we propose Structural Anchor Pruning (SAP), a training-free pruning method that identifies key visual patches from middle layers to achieve high performance compression. We also introduce Oracle Score Retention (OSR) protocol to evaluate how layer-wise information affects compression efficiency. Evaluations on the ViDoRe benchmark demonstrate that SAP reduces index vectors by over 90% while maintaining robust retrieval fidelity, providing a highly scalable solution for Visual RAG. Furthermore, our OSR-based analysis reveals that semantic structural anchor patches persist in the middle layers, unlike traditional pruning solutions that focus on the final layer where structural signals dissipate.

Zhuchenyang Liu, Ziyu Hu, Yao Zhang, Yu Xiao• 2026

Related benchmarks

TaskDatasetResultRank
Visual document retrievalViDoRe Avg. across 4 datasets v2
Full NDCG0.58
45
Showing 1 of 1 rows

Other info

Follow for update