REFLEX: Self-Refining Explainable Fact-Checking via Verdict-Anchored Style Control

About

The prevalence of fake news on social media demands automated fact-checking systems to provide accurate verdicts with faithful explanations. However, existing large language model (LLM)-based approaches ignore deceptive misinformation styles in LLM-generated explanations, resulting in unfaithful rationales that can mislead human judgments. They rely heavily on external knowledge sources, introducing hallucinations and even high latency that undermine reliability and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations (REFLEX), a self-refining paradigm that explicitly controls reasoning style anchored on verdict. REFLEX utilizes self-disagreement veracity signals between the backbone model and its fine-tuned variant to construct steering vectors, naturally disentangling fact from style. Experiments on the real-world dataset show REFLEX achieves state-of-the-art performance under LLaMA-series models with only 465 self-refined samples. Moreover, owing to its transferability, REFLEX yields up to a 7.54% gain on in-the-wild data. Our results further demonstrate that our method effectively mitigates faithful hallucination, thereby guiding the model toward more accurate verdicts than previous works in explainable fact-checking.

Chuyi Kong, Wei Gao, Jing Ma, Hongzhan Lin, Yuxi Sun• 2025

Related benchmarks

Task	Dataset	Result
Veracity Prediction	LIAR RAW	Macro F150.59	32
Veracity Prediction	RAWFC	Macro F1 Score64.99	26
Explanation quality evaluation	LIAR-RAW (test)	ChatGPT Meaningfulness Score1.65	7
Explanation quality evaluation	RAW-FC	M Score1.9	7
Explanation quality evaluation	LIAR RAW	Meaningfulness Score1.9	7

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord