Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Reasoning on RaR Medicine
Loading...
57.6
WR vs Base
Full DRO
46.264
49.207
52.15
55.093
Jun 16, 2025
WR vs Base
Updated 23d ago
Evaluation Results
Method
Method
Links
WR vs Base
Full DRO
Reward=DRO, Rollout-Gr...
2025.06
57.6
R3
Reward=R3, Rollout-Gro...
2025.06
56.1
R3
Reward=R3, Rollout-Gro...
2025.06
54.2
Rubric (RLER)
Reward=Rubric (RLER),...
2025.06
53.4
Avg Prob (RLPR)
Reward=Avg Prob (RLPR)...
2025.06
52.3
Base
Reward=Base
2025.06
50
Avg Logprob (VeriFree)
Reward=Avg Logprob (Ve...
2025.06
48
RL-F1
Reward=RL-F1, Rollout-...
2025.06
46.7
Feedback
Search any
task
Search any
task