Share your thoughts, 1 month free Claude Pro on usSee more

Open-ended Writing on ArenaHard

50Accuracy

Official Instruct Model

Updated 1mo ago

Evaluation Results

Method	Links
Official Instruct Model 2026.05		50
Opus-4.7 (xHigh) 2026.05		33.84
GLM-4.7 & ANDES 2026.05		12.94
GLM-4.7 (Scaffold-only) 2026.05		5.7
Opus-4.6 (1M) 2026.05		3.21
Gemini-3.1-Pro 2026.05		2.27
MiniMax-M2.5 2026.05		1.76
GPT-5.2 2026.05		1.27
Opus-4.6 2026.05		1.15
Base Model (Qwen3-1.7B) 2026.05		0.91
MiniMax-M2.1 2026.05		0.91
Qwen3-Max 2026.05		0.91
Kimi-K2-Thinking 2026.05		0.91
GPT-5.4 (High) 2026.05		0.91
GLM-4.7 (OpenCode) 2026.05		0.91
Sonnet-4.5 2026.05		0.21
GPT-5.1-Codex-Max 2026.05		0.14