Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TruthfulQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringTruthfulQA
Accuracy86.6
152
Hallucination DetectionTruthfulQA (test)
AUC-ROC89.5
105
Truthfulness EvaluationTruthfulQA
Accuracy70.8
103
Hallucination DetectionTruthfulQA
AUC (ROC)0.9417
102
TruthfulnessTruthfulQA
Truthfulness Accuracy97.55
86
Multiple-ChoiceTruthfulQA
MC1 Accuracy58.5
83
Question AnsweringTruthfulQA
Accuracy86.64
73
Factuality EvaluationTruthfulQA
MC294.3
73
Question AnsweringTruthfulQA
TruthfulQA Score63
61
FactualityTruthfulQA
Accuracy83.41
60
Question AnsweringTruthfulQA MC1
MC1 Accuracy88.8
54
Question AnsweringTruthfulQA
Performance Score81.1
52
Machine-Generated Text DetectionTruthfulQA
TPR@FPR-1%94.85
48
Truthful Question AnsweringTruthfulQA MC2
MC2 Accuracy56.46
46
HallucinationTruthfulQA
Score75.76
42
Question AnsweringTruthfulQA
Truthful*Inf Score88.23
42
Open ended generationTruthfulQA Without Rejected Samples open-ended (full)
Truthfulness74.67
39
Open ended generationTruthfulQA With All Samples open-ended (full)
Truthfulness82.75
39
Multiple-Choice Question AnsweringTruthfulQA MC1
MC1 Accuracy76.2
39
Question AnsweringTruthfulQA
MC149.93
35
Hallucination DetectionTruthfulQA
AUROC0.8851
33
Truthfulness EvaluationTruthfulQA
Reliability Score16.9
33
TruthfulnessTruthfulQA
Reward-1.8
32
Correctness detectionTruthfulQA
AUC0.97
30
Truthfulness EvaluationTruthfulQA (test)
MC154.95
30
Showing 25 of 152 rows