Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TruthfulQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionTruthfulQA
AUC (ROC)0.9417
178
Question AnsweringTruthfulQA
Accuracy86.6
164
Hallucination DetectionTruthfulQA (test)
AUC-ROC89.5
112
Truthfulness EvaluationTruthfulQA
Accuracy75
108
Factuality EvaluationTruthfulQA
MC294.3
103
FactualityTruthfulQA
Accuracy83.41
97
Hallucination DetectionTruthfulQA
AUROC0.8851
91
TruthfulnessTruthfulQA
Truthfulness Accuracy97.55
86
Multiple-ChoiceTruthfulQA
MC1 Accuracy58.5
83
Question AnsweringTruthfulQA
Accuracy86.64
73
Selective GenerationTruthfulQA
ROC-AUC0.744
66
Question AnsweringTruthfulQA
TruthfulQA Score63
61
Truthfulness EvaluationTruthfulQA
T·I Score84.7
59
Question AnsweringTruthfulQA MC1
MC1 Accuracy88.8
54
Machine-Generated Text DetectionTruthfulQA
TPR@FPR-1% (ChatGLM)98.38
54
Question AnsweringTruthfulQA
Performance Score81.1
52
TruthfulnessTruthfulQA
Truthfulness Accuracy72.36
51
Truthful Question AnsweringTruthfulQA MC2
MC2 Accuracy56.46
51
Open-ended GenerationTruthfulQA
BLEURT Score70.13
48
Predicting answer correctnessTruthfulQA
AUROC0.7272
48
Truthful and Informative GenerationTruthfulQA (test)
True*Info (%)84.7
44
Question AnsweringTruthfulQA
MC268.25
43
Generation correctness predictionTruthfulQA (test)
AURC62.69
42
HallucinationTruthfulQA
Score75.76
42
Question AnsweringTruthfulQA
Truthful*Inf Score88.23
42
Showing 25 of 223 rows
...