Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning Evaluation on ParaRev
Loading...
63.7
WR vs. Base
Full DRO
39.884
46.067
52.25
58.433
Jun 16, 2025
WR vs. Base
Updated 23d ago
Evaluation Results
Method
Method
Links
WR vs. Base
Full DRO
Reward=DRO, Rollout-Gr...
2025.06
63.7
R3
Reward=R3, Rollout-Gro...
2025.06
61.6
R3
Reward=R3, Rollout-Gro...
2025.06
57.8
Rubric (RLER)
Reward=Rubric (RLER),...
2025.06
55.9
Avg Prob (RLPR)
Reward=Avg Prob (RLPR)...
2025.06
54.3
Base
Reward=Base
2025.06
50
Avg Logprob (VeriFree)
Reward=Avg Logprob (Ve...
2025.06
49.5
RL-F1
Reward=RL-F1, Rollout-...
2025.06
40.8
Feedback
Search any
task
Search any
task