Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Factuality Evaluation on HaluEval

68.7Accuracy (Response)

GPT-4o Audio

8.17223.88639.655.314Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
68.7--5.5--
2026.04
51.361.2----
2026.04
4754.3----
2026.04
43.2-0.23.33.56.93
2026.04
38.9-3.725.70.57
2026.04
36.34203.13.10.37
2026.04
33.7-1.13.24.34.57
2026.04
28.8-0.12.82.92.58
2026.04
25-13.84.80.84
2026.04
21.2-0.34.24.40.74
2026.04
21-6.24.310.45.03
2026.04
18.7-03.13.10.22
2026.04
14-1.25.26.50.18
2026.04
10.5-02.12.10.22