Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Medical Question Answering on HealthBench Overall
Loading...
60.1
Overall Score
Baichuan-M2-32B
33.58
40.465
47.35
54.235
Feb 10, 2026
Overall Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Overall Score
Baichuan-M2-32B
Type=Specialized LLM
2026.02
60.1
o3
Type=Closed-source LLM
2026.02
59.8
Qwen3-30B-A3B-Instruct + More Query Rubrics
backbone=Qwen3-30B-A3B...
2026.02
59.5
Qwen3-4B-Instruct + More Query Rubrics
backbone=Qwen3-4B-Inst...
2026.02
52.9
Gemini-2.5-Pro
Type=Closed-source LLM
2026.02
52
Qwen3-4B-Instruct + Principle Rubrics
backbone=Qwen3-4B-Inst...
2026.02
51.1
Qwen3-4B-Instruct + Doctor Rubrics
backbone=Qwen3-4B-Inst...
2026.02
51
Qwen3-235B-Instruct
Type=Open-source LLM
2026.02
50
GPT-4.1
Type=Closed-source LLM
2026.02
47.9
HuatuoGPT-o1-72B
Type=Specialized LLM
2026.02
47.9
Deepseek-R1
Type=Open-source LLM
2026.02
47.4
Qwen3-4B-Instruct + Draft Rubrics
backbone=Qwen3-4B-Inst...
2026.02
46.9
Qwen3-30B-A3B-Instruct
Type=Our Method baseline
2026.02
46.8
Qwen3-32B
Type=Open-source LLM
2026.02
46.1
Qwen3-4B-Instruct
Type=Our Method baseline
2026.02
40.6
Claude-3.7-Sonnet
Type=Closed-source LLM
2026.02
34.6
Feedback
Search any
task
Search any
task