Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Reasoning on CMB
Loading...
84.05
Exact Match (EM)
GraphWalker
77.81
79.43
81.05
82.67
Apr 8, 2026
Exact Match (EM)
Updated 9d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
GraphWalker
Backbone LLM=Qwen3-14B
2026.04
84.05
Influence
Backbone LLM=Qwen3-14B
2026.04
83.5
Delta-KNN
Backbone LLM=Qwen3-14B
2026.04
83.02
LMS3
Backbone LLM=Qwen3-14B
2026.04
82.02
SPELL
Backbone LLM=Qwen3-14B
2026.04
81.98
Random
Backbone LLM=Qwen3-14B
2026.04
81.5
GradSel
Backbone LLM=Qwen3-14B
2026.04
81.48
Semantic-emb
Backbone LLM=Qwen3-14B
2026.04
80.52
CONE
Backbone LLM=Qwen3-14B
2026.04
80.41
Zero-shot
Backbone LLM=Qwen3-14B
2026.04
80.21
IDS
Backbone LLM=Qwen3-14B
2026.04
79.57
Time-series
Backbone LLM=Qwen3-14B
2026.04
78.05
Feedback
Search any
task
Search any
task