Share your thoughts, 1 month free Claude Pro on usSee more

Health Dialogue on HealthBench

44.92Accuracy

Official Instruct Model

Updated 1mo ago

Evaluation Results

Method	Links
Official Instruct Model 2026.05		44.92
GPT-5.4 (High) 2026.05		29.39
GLM-4.7 & ANDES 2026.05		25.87
GLM-4.7 (Scaffold-only) 2026.05		16.74
Opus-4.7 (xHigh) 2026.05		16.23
Gemini-3.1-Pro 2026.05		10.93
Opus-4.6 (1M) 2026.05		9.73
GPT-5.2 2026.05		9.33
Opus-4.6 2026.05		8.58
MiniMax-M2.5 2026.05		7.66
Base Model (Qwen3-1.7B) 2026.05		7.54
MiniMax-M2.1 2026.05		7.54
Qwen3-Max 2026.05		7.54
Kimi-K2-Thinking 2026.05		7.54
GPT-5.1-Codex-Max 2026.05		7.54
GLM-4.7 (OpenCode) 2026.05		7.54
Sonnet-4.5 2026.05		0