Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (AUCOAA)
Loading...
80
AUCOAA
Adaptive-Answer
13.44
30.72
48
65.28
Jan 6, 2026
AUCOAA
Updated 4d ago
Evaluation Results
Method
Method
Links
AUCOAA
Adaptive-Answer
Backbone=Qwen3-8B
2026.01
80
Format-Adaptive-Answer
Backbone=Qwen3-8B
2026.01
80
TWYN
Backbone=Qwen3-8B
2026.01
79.6
Hard-Length 8k
Backbone=Qwen3-8B
2026.01
77.2
SFT
Backbone=Qwen3-8B
2026.01
76.9
Hard-Length 16k
Backbone=Qwen3-8B
2026.01
76.3
Base model
Backbone=Qwen3-8B
2026.01
75.8
Soft-Length
Backbone=Qwen3-8B
2026.01
72.9
Hard-Length 8k → 4k
Backbone=Qwen3-8B
2026.01
69.8
Normalized-Length
Backbone=Qwen3-8B
2026.01
61
No-Thinking
Backbone=Qwen3-8B
2026.01
16
Feedback
Search any
task
Search any
task