Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FEVER

Benchmarks

Task NameDataset NameSOTA ResultTrend
Fact VerificationFEVER
Accuracy53.9
67
Fact VerificationFEVER (dev)
Label Accuracy82.1
57
Model EditingFEVER
Efficacy98.23
49
Fact VerificationFEVER (val)
True Deferral-Advice Loss0.555
48
Fact VerificationFEVER (test)
LA Score79.47
32
Fact VerificationFEVER 1.0 (dev)
Label Accuracy89.07
23
Information RetrievalFEVER BEIR
nDCG0.948
22
Fact VerificationFEVER
EM61.1
18
Fact VerificationFEVER
F1 Score53.9
18
Fact Extraction and VerificationFEVER (test)
Label Accuracy (LA)75.96
18
Explanation EvaluationFEVER (test)
Sufficiency9.72
16
Fact VerificationFEVER-Symmetric
Precision88
16
Fact Verification (Adversarial Claim Rewriting)FEVER
ASR2.63
15
Fact-checkingFEVER
F1 Macro94.3
14
Fact VerificationFEVER 1.0 (test)
Label Accuracy74.07
14
ClassificationFEVER Symmetric v2 1.0
Accuracy69.1
13
ClassificationFEVER v1 (ID)
Accuracy87.5
13
Fact-CheckingFEVER
Balanced Accuracy91.9
12
Ad Hoc RetrievalFEVER
NDCG@1085.5
12
Fact VerificationFEVER-S
Accuracy54
12
Fact VerificationFEVER
Accuracy61.4
12
Fact-verificationFEVER
Accuracy73.73
11
Fact VerificationFEVER (test)
Accuracy99.7
10
Sentence-Level Confidence PredictionFEVER
AUROC0.7
10
global fact consistency verificationFEVER
Precision99.5
10
Showing 25 of 57 rows