Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents
About
State-of-the-art single-agent claim verification methods struggle with complex claims that require nuanced analysis of multifaceted evidence. Inspired by real-world professional fact-checkers, we propose \textbf{DebateCV}, the first debate-driven claim verification framework powered by multiple LLM agents. In DebateCV, two \textit{Debaters} argue opposing stances to surface subtle errors in single-agent assessments. A decisive \textit{Moderator} is then required to weigh the evidential strength of conflicting arguments to deliver an accurate verdict. Yet, zero-shot Moderators are biased toward neutral judgments, and no datasets exist for training them. To bridge this gap, we propose \textbf{Debate-SFT}, a post-training framework that leverages synthetic data to enhance agents' ability to effectively adjudicate debates for claim verification. Results show that our methods surpass state-of-the-art non-debate approaches in both accuracy (across various evidence conditions) and justification quality.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Claim Verification | AVeriTeC Golden (dev) | Accuracy83.4 | 28 | |
| Claim Verification | AVeriTeC Retrieved (H) (dev) | Accuracy72.8 | 28 | |
| Claim Verification | AVeriTeC Retrieved (I) (dev) | Accuracy73.6 | 28 | |
| Justification Quality Evaluation | AVeriTeC Retrieved (H) 50 correctly verified claims | MOS3.67 | 6 |