Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (Mean@32 accuracy)
Loading...
66
Mean@32 Accuracy
HTPO
21.176
32.813
44.45
56.087
May 8, 2026
Mean@32 Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Mean@32 Accuracy
HTPO
Backbone=Qwen3-8B-Inst...
2026.05
66
DAPO
Backbone=Qwen3-8B-Inst...
2026.05
64.7
GSPO
Backbone=Qwen3-8B-Inst...
2026.05
63.7
SAPO
Backbone=Qwen3-8B-Inst...
2026.05
63.6
80/20-Rule
Backbone=Qwen3-8B-Inst...
2026.05
63.4
BAPO
Backbone=Qwen3-8B-Inst...
2026.05
59.2
GRPO†
Backbone=Qwen3-8B-Inst...
2026.05
58.6
HTPO
Backbone=Qwen3-8B-Base
2026.05
30.4
GSPO
Backbone=Qwen3-8B-Base
2026.05
30.3
SAPO
Backbone=Qwen3-8B-Base
2026.05
28.4
80/20-Rule
Backbone=Qwen3-8B-Base
2026.05
25.9
BAPO
Backbone=Qwen3-8B-Base
2026.05
24.7
DAPO
Backbone=Qwen3-8B-Base
2026.05
23.7
GRPO†
Backbone=Qwen3-8B-Base
2026.05
22.9
Feedback
Search any
task
Search any
task