Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models

About

Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form generation where redundant retrieved contexts and lengthy reasoning chains amplify factual errors. Recent studies highlight a critical phenomenon: the closer key information appears to the model outputs, the higher the factual accuracy. However, existing retrieval-augmented language models (RALMs) lack effective mechanisms to ensure this proximity - external evidence is injected into reasoning via multi-turn retrieval, but this cannot ensure key information stays close to the outputs. We propose Micro-Macro Retrieval (M2R), a novel retrieve-while-generate framework to fill this gap. At the macro level, M2R retrieves coarse-grained evidence from external sources; at the micro level, it extracts essential results from a key information repository built during reasoning and reuses them while generating answers. This design directly addresses the key-information-to-output proximity bottleneck, effectively reducing hallucination in long-form tasks. M2R is trained with a curriculum learning-based reinforcement learning strategy using customized rule-based rewards, enabling stable acquisition of retrieval and grounding skills. Extensive experiments across different benchmarks demonstrate the effectiveness of M2R, especially in lengthy-context settings.

Yujie Feng, Jian Li, Zhihan Zhou, Pengfei Xu, Yujia Zhang, Xiaoyu Li, Xiaohui Zhou, Alan Zhao, Xi Chen, Xiao-Ming Wu• 2026

Related benchmarks

TaskDatasetResultRank
Question Answering2Wiki
EM50
241
Multi-hop Question AnsweringHotpotQA
LLM Judge Score65.98
72
Question AnsweringBamboogle
EM46
61
Question AnsweringMuSiQue
EM25.5
38
Multi-hop Question Answering2Wiki
EM48.89
16
Multi-hop Question AnsweringMuSiQue
EM24.12
16
Multi-hop Question AnsweringBamboogle
EM44.56
16
Multi-question ReasoningHotpotQA 3Q
Exact Match Accuracy (3Q)32
6
Multi-question Reasoning2Wiki-3Q
Exact Match (EM)35.8
6
Multi-question ReasoningMuSiQue-3Q
Exact Match (EM)17.9
6
Showing 10 of 15 rows

Other info

Follow for update