Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME25 (ACC, mean@8)

70Accuracy (ACC)

Qwen3-4B (Ours)

-0.636817.701636.0454.3784Apr 15, 2026Apr 17, 2026Apr 19, 2026Apr 22, 2026Apr 24, 2026Apr 26, 2026Apr 29, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
70-
2026.04
66.7-
2026.04
43.3-
2026.04
27.527.5
2026.04
26.2526.25
2026.04
25.8325.83
2026.04
25.4225.42
2026.04
24.5824.58
2026.04
24.5824.58
2026.04
24.5824.58
2026.04
24.1724.17
2026.04
23.7523.75
2026.04
23.3323.33
2026.04
19.8-
2026.04
19.4-
2026.04
19-
2026.04
18.3-
2026.04
18.1-
2026.04
17.7-
2026.04
16.9-
2026.04
16.6716.67
2026.04
16.6716.67
2026.04
16.2-
2026.04
15.8-
2026.04
15.4215.42
2026.04
1515
2026.04
14.5814.58
2026.04
14.2-
2026.04
12.9-
2026.04
9.179.17
2026.04
3.753.75
2026.04
2.082.08