HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims
About
To tackle the AVeriTeC shared task hosted by the FEVER-24, we introduce a system that only employs publicly available large language models (LLMs) for each step of automated fact-checking, dubbed the Herd of Open LLMs for verifying real-world claims (HerO). For evidence retrieval, a language model is used to enhance a query by generating hypothetical fact-checking documents. We prompt pretrained and fine-tuned LLMs for question generation and veracity prediction by crafting prompts with retrieved in-context samples. HerO achieved 2nd place on the leaderboard with the AVeriTeC score of 0.57, suggesting the potential of open LLMs for verifying real-world claims. For future research, we make our code publicly available at https://github.com/ssu-humane/HerO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Claim Verification | AVeriTeC Retrieved (H) (dev) | Accuracy70.2 | 28 | |
| Claim Verification | AVeriTeC Retrieved (I) (dev) | Accuracy67.8 | 28 | |
| Claim Verification | AVeriTeC Golden (dev) | Accuracy80.4 | 28 | |
| Fact Checking | FEVER | Balanced Accuracy67.5 | 12 | |
| Fact Checking | AVeriTeC (test) | Hu-METEOR (Q only)0.48 | 9 | |
| Justification Quality Evaluation | AVeriTeC Retrieved (H) 50 correctly verified claims | MOS2.6 | 6 |