Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning

About

Document understanding aims to perform question answering and information extraction over document images, where the visual content is highly information-dense and most queries rely on only a few relevant layout regions. However, existing methods either adopt a one-pass strategy that implicitly assumes all layouts are equally important, or focus excessively on small regions at the cost of losing critical layout information. To address these limitations, we introduce Doc-CoB (Chain-of-Boxes), a simple-yet-effective framework that integrates coarse-to-fine layout-aware visual reasoning into multimodal large language models. Instead of directly zooming into small regions, Doc-CoB progressively focuses on query-relevant layouts while preserving global document information. Specifically, it first selects key layout boxes and then focuses on them for further understanding with visual prompting. To support this paradigm, we introduce two reasoning tasks for box recognition and box reasoning, with an automatic pipeline that constructs 249k training samples with intermediate visual supervision. Extensive experiments on seven benchmarks with four popular models show that Doc-CoB significantly improves performance, demonstrating its effectiveness and wide applicability.

Ye Mo, Kai Ye, Xianwei Mao, Zirui Shao, Gang Huang, Bo Zhang, Hangdi Xing, Kehan Chen, Huan Zhou, Zixu Yan, Jiajun Bu, Sheng Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Document Information ExtractionDeepForm (test)
F180.061
22
Document Question AnsweringDUDE (test)
ANLS65.931
22
Visually-rich Information ExtractionSROIE (test)
ANLS97.181
19
Document Information ExtractionFUNSD (test)
ANLS (%)82.951
16
Document Information ExtractionVRDU Ad-Buy (test)
Micro F193.761
16
Document Information ExtractionVRDU Registration Form (test)
Micro F192.641
16
Showing 6 of 6 rows

Other info

Follow for update