Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (Pass@1, Pass@16, Token Metrics)
Loading...
13.5
Mean@16
OptPO-SFT
3.412
6.031
8.65
11.269
Dec 2, 2025
Mean@16
Pass@16
Pass@1
Tokens Used (M)
Token Saving
Updated 4d ago
Evaluation Results
Method
Method
Links
Mean@16
Pass@16
Pass@1
Tokens Used (M)
Token Saving
OptPO-SFT
Backbone=Qwen2.5-Math-...
2025.12
13.5
50
20
-
0.61
TTSFT
Backbone=Qwen2.5-Math-...
2025.12
13.1
46.7
20
-
-
OptPO-SFT
Backbone=Llama-3.1-8B-...
2025.12
6.2
36.7
3.3
-
15.33
TTSFT
Backbone=Llama-3.1-8B-...
2025.12
3.8
26.7
6.7
-
-
Feedback
Search any
task
Search any
task