Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

About

While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations -- generating plausible yet factually incorrect contents. Existing methods to mitigating such risk often rely on sampling multiple full-length generations, which introduces significant response latency and becomes ineffective when the model consistently produces hallucinated outputs with high confidence. To address these limitations, we introduce Monitoring Decoding (MD), a novel framework that dynamically monitors the generation process and selectively applies in-process interventions, focusing on revising crucial tokens responsible for hallucinations. Instead of waiting until completion of multiple full-length generations, we identify hallucination-prone tokens during generation using a monitor function, and further refine these tokens through a tree-based decoding strategy. This approach ensures an enhanced factual accuracy and coherence in the generated output while maintaining efficiency. Experimental results demonstrate that MD consistently outperforms self-consistency-based approaches in both effectiveness and efficiency, achieving higher factual accuracy while significantly reducing computational overhead.

Yurui Chang, Bochuan Cao, Lu Lin• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy85.2
983
Trivia QATrivia QA--
32
Question AnsweringTruthful QA
Info Accuracy98
27
Question AnsweringNQ-Open
Exact Match (EM)47.4
24
Showing 4 of 4 rows

Other info

Follow for update