Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (Peak avg@32 score)
Loading...
30.31
Peak avg@32 Score
OTB
24.1428
25.7439
27.345
28.9461
Feb 6, 2026
Peak avg@32 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Peak avg@32 Score
OTB
Backbone=Qwen3-8b-Base...
2026.02
30.31
OGB
Backbone=Qwen3-8b-Base...
2026.02
30.1
OPO
Backbone=Qwen3-8b-Base...
2026.02
27.29
GRPO
Backbone=Qwen3-8b-Base...
2026.02
25.31
RLOO
Backbone=Qwen3-8b-Base...
2026.02
24.38
Feedback
Search any
task
Search any
task