Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FEVER

Benchmarks

Task NameDataset NameSOTA ResultTrend
Fact VerificationFEVER
Accuracy53.9
72
End-to-End Defense in RAGFEVER
ASR0
63
Fact VerificationFEVER (dev)
Label Accuracy82.1
57
Model EditingFEVER
Efficacy98.23
49
Fact VerificationFEVER (val)
True Deferral-Advice Loss0.555
48
Complex reasoningFEVER (test)
Macro F185.18
37
Model EditingFEVER 20K edits (test)
Efficacy99.07
36
Feature AttributionFEVER
Comprehensiveness0.75
33
Fact VerificationFEVER (test)
LA Score79.47
32
Fact verificationFEVER
Accuracy87.45
30
Lifelong Model EditingFEVER
Efficacy98.38
27
Fact VerificationFEVER 1.0 (dev)
Label Accuracy89.07
23
Information RetrievalFEVER BEIR
nDCG0.948
22
Claim CorrectionFEVER Retrieved evidence
SARI (%)50.7141
21
Passage RerankingFEVER BEIR
NDCG@1073.47
19
Fact VerificationFEVER
EM61.1
18
Fact VerificationFEVER
F1 Score53.9
18
Fact Extraction and VerificationFEVER (test)
Label Accuracy (LA)75.96
18
Explanation EvaluationFEVER (test)
Sufficiency9.72
16
Fact VerificationFEVER-Symmetric
Precision88
16
Knowledge Poisoning AttackFEVER k=10 (test)
Attack Success Rate (ASR)73
15
Fact Verification (Adversarial Claim Rewriting)FEVER
ASR2.63
15
Sentence-Level Confidence PredictionFEVER
AUROC0.737
15
Document RerankingFEVER
NDCG@581.556
14
Fact-checkingFEVER
F1 Macro94.3
14
Showing 25 of 78 rows