Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HaluEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionHaluEval (test)
AUC-ROC97.1
126
Hallucination DetectionHaluEval Dialogue latest (test)
Accuracy84.88
22
Hallucination DetectionHaluEval
Dialogue Score72.2
15
Question AnsweringHaluEval QA
Accuracy45.4
14
Question AnsweringHaluEval
EM68
12
Grounded Text GenerationHaluEval
F1 Score72.66
11
GroundednessHaluEval
Kendall's Tau0.78
11
Hallucination DetectionHaluEval QA (test)
TPR78.9
8
Hallucination DetectionHaluEval Summarization (Starling-LM-7B-alpha)
TPR81
7
Hallucination DetectionHaluEval Sum
Accuracy (H)37.46
7
Hallucination DetectionHaluEval Summarization
Accuracy50
6
Instruction FollowingHaluEval QAmis (test)
Failure Rate0.0078
6
Instruction FollowingHaluEval (test)
Failure Rate (Sum)0.36
6
Hallucination DetectionHaluEval
AUROC0.8021
6
Question AnsweringHaluEval qa_samples
F1 Score86.7
5
Hallucination RegenerationHaluEval QA
Accuracy69.45
5
Hallucination DetectionHaluEval Dialogue (test)
Groundedness (Gamma)0.287
1
Hallucination DetectionHaluEval ChatGPT (test)
Coverage94.5
1
Showing 18 of 18 rows