HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims

About

To tackle the AVeriTeC shared task hosted by the FEVER-24, we introduce a system that only employs publicly available large language models (LLMs) for each step of automated fact-checking, dubbed the Herd of Open LLMs for verifying real-world claims (HerO). For evidence retrieval, a language model is used to enhance a query by generating hypothetical fact-checking documents. We prompt pretrained and fine-tuned LLMs for question generation and veracity prediction by crafting prompts with retrieved in-context samples. HerO achieved 2nd place on the leaderboard with the AVeriTeC score of 0.57, suggesting the potential of open LLMs for verifying real-world claims. For future research, we make our code publicly available at https://github.com/ssu-humane/HerO.

Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park• 2024

Related benchmarks

Task	Dataset	Result
Claim Verification	AVeriTeC Retrieved (H) (dev)	Accuracy70.2	28
Claim Verification	AVeriTeC Retrieved (I) (dev)	Accuracy67.8	28
Claim Verification	AVeriTeC Golden (dev)	Accuracy80.4	28
Fact Checking	FEVER	Balanced Accuracy67.5	12
Fact Checking	AVeriTeC (test)	Hu-METEOR (Q only)0.48	9
Justification Quality Evaluation	AVeriTeC Retrieved (H) 50 correctly verified claims	MOS2.6	6

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord