Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Health Dialogue on HealthBench
Loading...
44.92
Accuracy
Official Instruct Model
-1.7968
10.3316
22.46
34.5884
May 31, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Official Instruct Model
Evaluation Category=Of...
2026.05
44.92
GPT-5.4 (High)
Evaluation Category=Ag...
2026.05
29.39
GLM-4.7 & ANDES
Evaluation Category=Pr...
2026.05
25.87
GLM-4.7 (Scaffold-only)
Evaluation Category=Pr...
2026.05
16.74
Opus-4.7 (xHigh)
Evaluation Category=Ag...
2026.05
16.23
Gemini-3.1-Pro
Evaluation Category=Ag...
2026.05
10.93
Opus-4.6 (1M)
Evaluation Category=Ag...
2026.05
9.73
GPT-5.2
Evaluation Category=Ag...
2026.05
9.33
Opus-4.6
Evaluation Category=Ag...
2026.05
8.58
MiniMax-M2.5
Evaluation Category=Ag...
2026.05
7.66
Base Model (Qwen3-1.7B)
Evaluation Category=Ze...
2026.05
7.54
MiniMax-M2.1
Evaluation Category=Ag...
2026.05
7.54
Qwen3-Max
Evaluation Category=Ag...
2026.05
7.54
Kimi-K2-Thinking
Evaluation Category=Ag...
2026.05
7.54
GPT-5.1-Codex-Max
Evaluation Category=Ag...
2026.05
7.54
GLM-4.7 (OpenCode)
Evaluation Category=Pr...
2026.05
7.54
Sonnet-4.5
Evaluation Category=Ag...
2026.05
0
Feedback
Search any
task
Search any
task