Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAB-DQA: Addressing Query Aspect Importance in Document Question Answering with Multi-Armed Bandits

About

Document Question Answering (DQA) involves generating answers from a document based on a user's query, representing a key task in document understanding. This task requires interpreting visual layouts, which has prompted recent studies to adopt multimodal Retrieval-Augmented Generation (RAG) that processes page images for answer generation. However, in multimodal RAG, visual DQA struggles to utilize a large number of images effectively, as the retrieval stage often retains only a few candidate pages (e.g., Top-4), causing informative but less visually salient content to be overlooked in favor of common yet low-information pages. To address this issue, we propose a Multi-Armed Bandit-based DQA framework (MAB-DQA) to explicitly model the varying importance of multiple implicit aspects in a query. Specifically, MAB-DQA decomposes a query into aspect-aware subqueries and retrieves an aspect-specific candidate set for each. It treats each subquery as an arm and uses preliminary reasoning results from a small number of representative pages as reward signals to estimate aspect utility. Guided by an exploration-exploitation policy, MAB-DQA dynamically reallocates retrieval budgets toward high-value aspects. With the most informative pages and their correlations, MAB-DQA generates the expected results. On four benchmarks, MAB-DQA shows an average improvement of 5%-18% over the state-of-the-art method, consistently enhancing document understanding. Codes are available at https://github.com/ElephantOH/MAB-DQA.

Yixin Xiang, Yunshan Ma, Xiaoyu Du, Yibing Chen, Yanxin Zhang, Jinhui Tang• 2026

Related benchmarks

TaskDatasetResultRank
Document Question AnsweringLongDocURL
Accuracy (All)56.4
30
RetrievalMMLongBench
Recall75.86
18
RetrievalLongDocURL
Recall77.02
18
Multimodal Document Question AnsweringPaperTab
Accuracy26.9
12
Multimodal Document Question AnsweringFetaTab
Accuracy63.8
12
Showing 5 of 5 rows

Other info

Follow for update