Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on DiaHalu sampled (test)
Loading...
245
Support (Total Samples)
GPT-4
117.08
150.29
183.5
216.71
May 12, 2026
Support (Total Samples)
Precision
Recall
F1 Score
Accuracy
Updated 21d ago
Evaluation Results
Method
Method
Links
Support (Total Samples)
Precision
Recall
F1 Score
Accuracy
GPT-4
evaluation_category=Ov...
2026.05
245
-
-
-
54.3
GPT-4
evaluation_category=ma...
2026.05
245
59.5
54.4
47.3
-
GPT-4
evaluation_category=we...
2026.05
245
59.5
54.3
47.3
-
GPT-4
evaluation_category=Cl...
2026.05
123
66.7
17.9
28.2
-
GPT-4
evaluation_category=Cl...
2026.05
122
52.4
91
66.5
-
Feedback
Search any
task
Search any
task