Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 24 (AUCOAA)
Loading...
81.8
AUCOAA
Format-Adaptive-Answer
19.504
35.677
51.85
68.023
Jan 6, 2026
AUCOAA
Updated 4d ago
Evaluation Results
Method
Method
Links
AUCOAA
Format-Adaptive-Answer
Backbone=Qwen3-8B
2026.01
81.8
Normalized-Length
Backbone=Qwen3-8B
2026.01
77.4
Adaptive-Answer
Backbone=Qwen3-8B
2026.01
75.8
Hard-Length 8k → 4k
Backbone=Qwen3-8B
2026.01
75.1
TWYN
Backbone=Qwen3-8B
2026.01
74.5
SFT
Backbone=Qwen3-8B
2026.01
73.7
Hard-Length 8k
Backbone=Qwen3-8B
2026.01
73.3
Soft-Length
Backbone=Qwen3-8B
2026.01
72.4
Hard-Length 16k
Backbone=Qwen3-8B
2026.01
70.7
Base model
Backbone=Qwen3-8B
2026.01
68.6
No-Thinking
Backbone=Qwen3-8B
2026.01
21.9
Feedback
Search any
task
Search any
task