Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Factuality Evaluation on LlamaQ

88.4Response Accuracy

GPT-4o Audio

58.55266.30174.0581.799Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
88.4--5.5--
2026.04
84.7-3.725.70.57
2026.04
81-6.24.310.45.03
2026.04
80.687.8----
2026.04
80.38303.13.10.37
2026.04
80-----
2026.04
79.3-0.23.33.56.93
2026.04
79-1.13.24.34.57
2026.04
78.9-----
2026.04
78.4-13.84.80.84
2026.04
78.284.6----
2026.04
73.3-----
2026.04
73-0.12.82.92.58
2026.04
72-1.25.26.50.18
2026.04
64.7-0.34.24.40.74
2026.04
62.3-02.12.10.22
2026.04
61.2-03.13.10.22
2026.04
59.7-----