Entailed Opinion Matters: Improving the Fact-Checking Performance of Language Models by Relying on their Entailment Ability
About
Automated fact-checking has been a challenging task for the research community. Prior work has explored various strategies, such as end-to-end training, retrieval-augmented generation, and prompt engineering, to build robust fact-checking systems. However, their accuracy has not been high enough for real-world deployment. We, on the other hand, propose a new learning paradigm, where evidence classification and entailed justifications made by generative language models (GLMs) are used to train encoder-only language models (ELMs). We conducted a rigorous set of experiments, comparing our approach with recent works along with various prompting and fine-tuning strategies. Additionally, we performed ablation studies, error analysis, quality analysis of model explanations, and a domain generalisation study to provide a comprehensive understanding of our approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Veracity Prediction | RAWFC (test) | -- | 28 | |
| Veracity Prediction | Factify-2 | Macro F178 | 13 | |
| Veracity Prediction | Mocheg | Macro F1 Score57 | 13 | |
| Veracity Prediction | VERITE | Macro F164 | 13 | |
| Veracity Prediction | X-Fact (test) | Macro F146 | 13 | |
| Veracity Prediction | Ru22fact (test) | MF173 | 13 | |
| Veracity Prediction | LIAR RAW | -- | 6 |