Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AMC (Pass@16, Mean@16, Token Usage)
Loading...
47.06
Mean @16
ME-ICPO
-0.6344
11.7478
24.13
36.5122
Dec 2, 2025
Dec 17, 2025
Jan 1, 2026
Jan 16, 2026
Jan 31, 2026
Feb 15, 2026
Mar 2, 2026
Mean @16
Pass @16
Pass @1
Total Tokens (M)
Token Saving
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean @16
Pass @16
Pass @1
Total Tokens (M)
Token Saving
ME-ICPO
Backbone=Qwen2.5-Math-...
2026.03
47.06
-
-
-
-
TTRL
Backbone=Qwen2.5-Math-...
2026.03
45.18
-
-
-
-
OptPO-SFT
Backbone=Qwen2.5-Math-...
2025.12
39.7
81.9
43.4
-
12.08
TTSFT
Backbone=Qwen2.5-Math-...
2025.12
37.9
80.7
42.2
-
-
ToT (Maj vote)
Backbone=Qwen2.5-Math-...
2026.03
29.37
-
-
-
-
TTSFT
Backbone=Llama-3.1-8B-...
2025.12
20.1
63.9
28.9
-
-
OptPO-SFT
Backbone=Llama-3.1-8B-...
2025.12
18.7
60.2
19.3
-
17.16
ToT (self eval)
Backbone=Qwen2.5-Math-...
2026.03
16.19
-
-
-
-
MCTR
Backbone=Qwen2.5-Math-...
2026.03
1.2
-
-
-
-
Feedback
Search any
task
Search any
task