Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Chat Evaluation on ArenaHard

49.2Accuracy

Official Instruct Model

-1.749611.477724.70537.9323May 31, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
49.2
2026.05
17.32
2026.05
14.34
2026.05
14.11
2026.05
10.21
2026.05
9
2026.05
7.78
2026.05
6.49
2026.05
6.42
2026.05
2.18
2026.05
0.42
2026.05
0.42
2026.05
0.42
2026.05
0.42
2026.05
0.42
2026.05
0.24
2026.05
0.21