Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models

About

Recent progress in vision-language models (VLMs) has led to impressive results in document understanding tasks, but their high computational demands remain a challenge. To mitigate the compute burdens, we propose a lightweight token pruning framework that filters out non-informative background regions from document images prior to VLM processing. A binary patch-level classifier removes non-text areas, and a max-pooling refinement step recovers fragmented text regions to enhance spatial coherence. Experiments on real-world document datasets demonstrate that our approach substantially lowers computational costs, while maintaining comparable accuracy.

Jaemin Son, Sujin Choi, Inyong Yun• 2025

Related benchmarks

TaskDatasetResultRank
Information ExtractionSROIE (test)
F1 Score87.9
62
Document ParsingSCAN (test)
ANLS61.8
4
Document ParsingPhoto (test)
ANLS71
4
Key Information ExtractionCORD (test)
F1 Score83
4
Showing 4 of 4 rows

Other info

Follow for update