Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME25 (Accuracy, Average response length)
Loading...
80.3
Accuracy
Qwen3-4B-Thinking
48.788
56.969
65.15
73.331
Jan 7, 2026
Accuracy
Avg Response Length
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Avg Response Length
Qwen3-4B-Thinking
Method Category=Base M...
2026.01
80.3
23,912
CoD
Method Category=Prompt...
2026.01
76.7
17,338
Ada-R1
Method Category=Traini...
2026.01
68.9
11,969
Task Arithmetic
Method Category=Data-f...
2026.01
67.8
11,395
RPAM
Method Category=Data-d...
2026.01
67.8
10,157
TIES Merging
Method Category=Data-f...
2026.01
60
10,891
AIM
Method Category=Data-d...
2026.01
60
9,934
Average Merging
Method Category=Data-f...
2026.01
57.8
10,099
DARE-Linear
Method Category=Data-f...
2026.01
56.7
12,247
ACM
Method Category=Data-d...
2026.01
54.4
11,080
Qwen3-4B-Instruct
Method Category=Base M...
2026.01
50
7,368
Feedback
Search any
task
Search any
task