Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AVeriTeC

Benchmarks

Task NameDataset NameSOTA ResultTrend
Claim VerificationAVeriTeC Retrieved (I) (dev)
Accuracy73.6
28
Claim VerificationAVeriTeC Retrieved (H) (dev)
Accuracy72.8
28
Claim VerificationAVeriTeC Golden (dev)
Accuracy83.4
28
Ternary claim verificationAVeriTeC (test)
Balanced Accuracy48.5
26
Document-level Claim ExtractionAVeriTeC-DCE (test)
chrF26.4
11
Fact-checkingAVeriTeC (test)
Hu-METEOR (Q only)0.48
9
Faithfulness EvaluationAVeriTeC
FID0.3
8
Fact VerificationAVERITEC
Accuracy49.1
8
Justification Quality EvaluationAVeriTeC Retrieved (H) 50 correctly verified claims
MOS3.67
6
Fact-checkingAVeriTeC (dev)
Hu-METEOR (Q only)0.46
6
Document-level Claim Extraction (Sentence)AVeriTeC-DCE 1.0 (test)
SARI6.6
6
Sentence ExtractionAVeriTeC-DCE 1.0 (test)
P@147.8
6
Document-level Claim Extraction (Decontextualized Sentence)AVeriTeC-DCE 1.0 (test)
SARI6.7
5
Claim VerificationAVeriTeC (dev)
Supported F168
4
Automated Fact-CheckingAVeriTeC (test)
Precision54
4
Automated Fact CheckingAVeriTeC
Macro F1 Score77.6
3
Claim VerificationAVERITEC
Accuracy0.7409
3
Sentence extractionAVeriTeC-DCE (full)
IsCheckWorthy44
2
Showing 18 of 18 rows