Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 24 (AUCOAA)

81.8AUCOAA

Format-Adaptive-Answer

Updated 4mo ago

Evaluation Results

Method	Links
Format-Adaptive-Answer 2026.01		81.8
Normalized-Length 2026.01		77.4
Adaptive-Answer 2026.01		75.8
Hard-Length 8k → 4k 2026.01		75.1
TWYN 2026.01		74.5
SFT 2026.01		73.7
Hard-Length 8k 2026.01		73.3
Soft-Length 2026.01		72.4
Hard-Length 16k 2026.01		70.7
Base model 2026.01		68.6
No-Thinking 2026.01		21.9