Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Factuality Evaluation on WebQ

81Accuracy (Response)

GPT-4o Audio

24.42439.11253.868.488Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
81--5.5--
2026.04
75.1-6.24.310.45.03
2026.04
70.2-0.23.33.56.93
2026.04
68.977.7----
2026.04
68.8-3.725.70.57
2026.04
67.271.503.13.10.37
2026.04
66.173.5----
2026.04
64.5-13.84.80.84
2026.04
62-1.13.24.34.57
2026.04
55-----
2026.04
50.5-----
2026.04
50.2-----
2026.04
44.7-1.25.26.50.18
2026.04
40.4-0.12.82.92.58
2026.04
37-03.13.10.22
2026.04
32.2-0.34.24.40.74
2026.04
29.3-----
2026.04
26.6-02.12.10.22