Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on MATH-500 (AUCOAA)
Loading...
91.9
AUCOAA
Normalized-Length
80.876
83.738
86.6
89.462
Jan 6, 2026
AUCOAA
Updated 4d ago
Evaluation Results
Method
Method
Links
AUCOAA
Normalized-Length
Approach=Normalized le...
2026.01
91.9
Format-Adaptive-Answer
Approach=Format-based...
2026.01
91.2
Adaptive-Answer
Approach=Rejection sam...
2026.01
90.3
TWYN
Approach=Baseline, Bac...
2026.01
89.6
Hard-Length 8k → 4k
Approach=Length decay...
2026.01
89.1
SFT
Approach=Standard SFT,...
2026.01
87.9
Hard-Length 8k
Approach=Hard length c...
2026.01
87.9
Soft-Length
Approach=Soft length c...
2026.01
87.9
Hard-Length 16k
Approach=Hard length c...
2026.01
83.4
Base model
Approach=Standard Reas...
2026.01
81.7
No-Thinking
Approach=No CoT, Backb...
2026.01
81.3
Feedback
Search any
task
Search any
task