Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Medical Reasoning on Medical task
Loading...
43.7
Accuracy
SDFT
22.692
28.146
33.6
39.054
Jan 27, 2026
Accuracy
Average Token Usage
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Token Usage
SDFT
Base Model=Olmo-3-7B-T...
2026.01
43.7
4,180
Olmo-3-7B-Think
Fine-tuning strategy=B...
2026.01
31.2
4,612
SFT
Base Model=Olmo-3-7B-T...
2026.01
23.5
3,273
Feedback
Search any
task
Search any
task