Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical and Health Knowledge on HealthBench
Loading...
37.2
Accuracy
GLM-4.7 & ANDES (Ours)
-1.488
8.556
18.6
28.644
May 31, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
GLM-4.7 & ANDES (Ours)
Protocol=Proposed Meth...
2026.05
37.2
Official Instruct Model
Protocol=Official Inst...
2026.05
29.58
GPT-5.2
Protocol=Agent-Post-Tr...
2026.05
21.92
Opus-4.6 (1M)
Protocol=Agent-Post-Tr...
2026.05
21.12
Opus-4.6
Protocol=Agent-Post-Tr...
2026.05
18.81
GPT-5.4 (High)
Protocol=Agent-Post-Tr...
2026.05
18.64
Gemini-3.1-Pro
Protocol=Agent-Post-Tr...
2026.05
18.45
Opus-4.7 (xHigh)
Protocol=Agent-Post-Tr...
2026.05
16.53
GLM-4.7 (Scaffold-only)
Protocol=Proposed Meth...
2026.05
7.38
Base Model (SmolLM3-3B)
Protocol=Zero-Shot, Ba...
2026.05
0
Sonnet-4.5
Protocol=Agent-Post-Tr...
2026.05
0
Qwen3-Max
Protocol=Agent-Post-Tr...
2026.05
0
Kimi-K2-Thinking
Protocol=Agent-Post-Tr...
2026.05
0
MiniMax-M2.1
Protocol=Agent-Post-Tr...
2026.05
0
GPT-5.1-Codex-Max
Protocol=Agent-Post-Tr...
2026.05
0
MiniMax-M2.5
Protocol=Agent-Post-Tr...
2026.05
0
GLM-4.7 (OpenCode)
Protocol=Proposed Meth...
2026.05
0
Feedback
Search any
task
Search any
task