BEAR: Budgeted Evidence Allocation for Multi-Document Reasoning

About

We argue that multi-document reasoning is constrained not only by how much text a model can read, but also by how limited query-time evidence budget is allocated across documents and semantic granularities. Full-context inference exposes the model to broad evidence non-selectively and at high per-query cost, while flat chunk retrieval often returns locally relevant passages that are weakly organized for cross-document synthesis. We present \textbf{BEAR}, a framework for structured evidence allocation that builds hierarchical semantic indices offline and performs coarse-to-fine evidence access at query time through complementary \emph{exploration} and \emph{recovery} paths. This coarse-to-fine design can be viewed as structured evidence allocation under a fixed evidence-context budget. Across synthetic and real-world benchmarks, BEAR performs particularly strongly on DragonBall, remains competitive with strong retrieval-based baselines on HotpotQA, and yields the best retrieval-based result on 2Wiki under our evaluated protocol, while operating under substantially smaller \emph{query-time evidence budgets} than the reported long-context references. Additional analyses suggest that the gains are associated with hierarchy as an allocation substrate together with complementary exploration and recovery, rather than semantic chunking alone.

Lin Sun, Linglin Zhang, Jingang Huang, Change Jia, Zhengwei Cheng, Xiangzheng Zhang• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA (test)	F163.43	311
Multi-hop Question Answering	2WikiMultiHopQA (test)	EM52.5	226
Web-based Question Answering	BrowseComp+	Accuracy66.6	22
Multi-hop Question Answering	Dragonball DragBalance (test)	Recall85.8	11
Multi-hop Question Answering	Dragonball DragSingleZh (test)	Recall74.99	10

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord