Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended Writing on ArenaHard
Loading...
50
Accuracy
Official Instruct Model
-1.8544
11.6078
25.07
38.5322
May 31, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Official Instruct Model
Evaluation Category=Of...
2026.05
50
Opus-4.7 (xHigh)
Evaluation Category=Ag...
2026.05
33.84
GLM-4.7 & ANDES
Evaluation Category=Pr...
2026.05
12.94
GLM-4.7 (Scaffold-only)
Evaluation Category=Pr...
2026.05
5.7
Opus-4.6 (1M)
Evaluation Category=Ag...
2026.05
3.21
Gemini-3.1-Pro
Evaluation Category=Ag...
2026.05
2.27
MiniMax-M2.5
Evaluation Category=Ag...
2026.05
1.76
GPT-5.2
Evaluation Category=Ag...
2026.05
1.27
Opus-4.6
Evaluation Category=Ag...
2026.05
1.15
Base Model (Qwen3-1.7B)
Evaluation Category=Ze...
2026.05
0.91
MiniMax-M2.1
Evaluation Category=Ag...
2026.05
0.91
Qwen3-Max
Evaluation Category=Ag...
2026.05
0.91
Kimi-K2-Thinking
Evaluation Category=Ag...
2026.05
0.91
GPT-5.4 (High)
Evaluation Category=Ag...
2026.05
0.91
GLM-4.7 (OpenCode)
Evaluation Category=Pr...
2026.05
0.91
Sonnet-4.5
Evaluation Category=Ag...
2026.05
0.21
GPT-5.1-Codex-Max
Evaluation Category=Ag...
2026.05
0.14
Feedback
Search any
task
Search any
task