Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TruthfulQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Truthfulness EvaluationTruthfulQA
Accuracy70.8
93
Hallucination DetectionTruthfulQA (test)
AUC-ROC89.5
91
Multiple-ChoiceTruthfulQA
MC1 Accuracy58.5
83
Question AnsweringTruthfulQA
Accuracy86.6
82
Question AnsweringTruthfulQA
Accuracy86.64
73
Machine-Generated Text DetectionTruthfulQA
TPR@FPR-1%94.85
48
Hallucination DetectionTruthfulQA
AUC (ROC)0.9417
47
HallucinationTruthfulQA
Score75.76
42
Question AnsweringTruthfulQA
Truthful*Inf Score88.23
42
Factuality EvaluationTruthfulQA
MC149.75
40
Open ended generationTruthfulQA Without Rejected Samples open-ended (full)
Truthfulness74.67
39
Open ended generationTruthfulQA With All Samples open-ended (full)
Truthfulness82.75
39
Multiple-Choice Question AnsweringTruthfulQA MC1
MC1 Accuracy76.2
33
Truthfulness EvaluationTruthfulQA
Reliability Score16.9
33
Truthfulness EvaluationTruthfulQA (test)
MC154.95
30
Question AnsweringTruthfulQA MC1
MC1 Accuracy88.8
24
Truthfulness and InformativenessTruthfulQA
TruthfulQA Score78.46
24
Short-Answer FactualityTruthfulQA (test)
MC1 Factuality Score47.47
24
Truthfulness EvaluationTruthfulQA medical (test)
Health Score83.6
22
Question AnsweringTRUTHFULQA
Factual Accuracy47
21
Question AnsweringTruthfulQA o=1 Domain-level split
Accuracy88.5
21
Question AnsweringTruthfulQA o=1 Semantic-level
Accuracy90.9
21
Question AnsweringTruthfulQA o=1 (Exact split)
Accuracy90
21
Question AnsweringTruthfulQA Domain-level split, o=3
Accuracy92.8
21
Question AnsweringTruthfulQA Semantic-level split o=3
Accuracy98.1
21
Showing 25 of 97 rows