Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents

About

State-of-the-art single-agent claim verification methods struggle with complex claims that require nuanced analysis of multifaceted evidence. Inspired by real-world professional fact-checkers, we propose \textbf{DebateCV}, the first debate-driven claim verification framework powered by multiple LLM agents. In DebateCV, two \textit{Debaters} argue opposing stances to surface subtle errors in single-agent assessments. A decisive \textit{Moderator} is then required to weigh the evidential strength of conflicting arguments to deliver an accurate verdict. Yet, zero-shot Moderators are biased toward neutral judgments, and no datasets exist for training them. To bridge this gap, we propose \textbf{Debate-SFT}, a post-training framework that leverages synthetic data to enhance agents' ability to effectively adjudicate debates for claim verification. Results show that our methods surpass state-of-the-art non-debate approaches in both accuracy (across various evidence conditions) and justification quality.

Haorui He, Yupeng Li, Dacheng Wen, Yang Chen, Reynold Cheng, Donglong Chen, Francis C. M. Lau• 2025

Related benchmarks

Task	Dataset	Result
Claim Verification	AVeriTeC Golden (dev)	Accuracy83.4	28
Claim Verification	AVeriTeC Retrieved (H) (dev)	Accuracy72.8	28
Claim Verification	AVeriTeC Retrieved (I) (dev)	Accuracy73.6	28
Justification Quality Evaluation	AVeriTeC Retrieved (H) 50 correctly verified claims	MOS3.67	6

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord