Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Clinical Reasoning on MEDREASON
Loading...
73.4
Pass@1
Outcome Reward
23.272
36.286
49.3
62.314
Oct 2, 2025
Pass@1
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Outcome Reward
Backbone=Llama3.1-8B
2025.10
73.4
Ours (Sparse)
Backbone=Llama3.1-8B
2025.10
73.1
Ours (Interval)
Backbone=Llama3.1-8B
2025.10
71.4
Ours (Dense)
Backbone=Llama3.1-8B
2025.10
69.8
Ours (Sparse)
Backbone=Qwen3-4B
2025.10
66.1
Outcome Reward
Backbone=Qwen2.5-7B
2025.10
65
Ours (Interval)
Backbone=Qwen2.5-7B
2025.10
63.8
SFT
Backbone=Llama3.1-8B
2025.10
63.5
Ours (Dense)
Backbone=Qwen2.5-7B
2025.10
58.3
SFT
Backbone=Qwen3-4B
2025.10
57.6
Outcome Reward
Backbone=Qwen3-4B
2025.10
56.3
Ours (Dense)
Backbone=Qwen3-4B
2025.10
53.7
SFT
Backbone=Qwen2.5-7B
2025.10
53
Ours (Interval)
Backbone=Qwen3-4B
2025.10
52.1
Ours (Sparse)
Backbone=Qwen2.5-7B
2025.10
25.2
Feedback
Search any
task
Search any
task