Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Factuality Evaluation on TriviaQA

90.6Response Accuracy

GPT-4o Audio

20.08838.39456.775.006Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
90.6--5.5--
2026.04
78.286.8----
2026.04
77.584.9----
2026.04
73.6-3.725.70.57
2026.04
69.673.703.13.10.37
2026.04
66-----
2026.04
63.8-0.12.82.92.58
2026.04
62.1-0.23.33.56.93
2026.04
61.7-13.84.80.84
2026.04
58-6.24.310.45.03
2026.04
58-1.13.24.34.57
2026.04
53.9-1.25.26.50.18
2026.04
50-----
2026.04
48.3-----
2026.04
39.1-0.34.24.40.74
2026.04
29.7-03.13.10.22
2026.04
27-----
2026.04
22.8-02.12.10.22