End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models
About
We propose end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (e.g., support, refute or not enough information), and to generate a statement to summarize and explain the reasoning and ruling process. To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims where each claim is annotated with a truthfulness label and a ruling statement, and 33,880 textual paragraphs and 12,112 images in total as evidence. To establish baseline performances on Mocheg, we experiment with several state-of-the-art neural architectures on the three pipelined subtasks: multimodal evidence retrieval, claim verification, and explanation generation, and demonstrate that the performance of the state-of-the-art end-to-end multimodal fact-checking does not provide satisfactory outcomes. To the best of our knowledge, we are the first to build the benchmark dataset and solutions for end-to-end multimodal fact-checking and explanation generation. The dataset, source code and model checkpoints are available at https://github.com/VT-NLP/Mocheg.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Claim Verification | AIChartClaim | Macro F159 | 38 | |
| Claim Verification | ChartCheck | Macro F10.578 | 38 | |
| Claim Verification | Mocheg | Macro F145.6 | 32 | |
| Claim Verification | MR2 | Macro F168 | 32 | |
| Explanation Generation | AIChartClaim 1.0 (test) | ROUGE-141.5 | 9 | |
| Explanation Generation | AIChartClaim | ROUGE-L33.4 | 9 | |
| Explanation Generation | ChartCheck 1.0 (test) | ROUGE-147.1 | 9 | |
| Explanation Generation | ChartCheck | ROUGE-L39.6 | 9 | |
| Explanation Generation | AIChartClaim (test) | ROUGE-139.5 | 9 | |
| Explanation Generation | ChartCheck (test) | ROUGE-145.3 | 9 |