Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Medical Reasoning on DDXPlus
Loading...
81.1
Performance Score
Gemini-2.5-pro
39.6664
50.4232
61.18
71.9368
Jan 6, 2026
Performance Score
Cost ($)
Inference Delay (h)
Updated 4d ago
Evaluation Results
Method
Method
Links
Performance Score
Cost ($)
Inference Delay (h)
Gemini-2.5-pro
Setting=Manual
2026.01
81.1
109.85
27.65
EvoRoute
Setting=Ours
2026.01
79.5
65.8
20.53
MasRouter
Setting=Routing
2026.01
73.1
92.38
32.3
GraphRouter
Setting=Routing
2026.01
62.5
119.5
27.5
PromptLLM
Setting=Routing
2026.01
60.07
127.05
26.88
GPT-4o
Setting=Manual
2026.01
57.58
135.1
34.31
GPT-4.1
Setting=Manual
2026.01
55.3
89.82
21.95
Qwen3-14b
Setting=Manual
2026.01
41.26
14.16
31.67
Feedback
Search any
task
Search any
task