Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2025 (Average Accuracy @ 32 samples)

13.9Average Accuracy (32 samples)

WIST

Updated 4mo ago

Evaluation Results

Method	Links
WIST 2026.03		13.9
SPICE 2026.03		13.4
R-Zero 2026.03		12.8
Base Model 2026.03		11.3
SPICE 2026.03		10.9
WIST 2026.03		9.7
R-Zero 2026.03		7.1
Base Model 2026.03		6.4
R-Zero 2026.03		1.5
WIST 2026.03		1.4
Base Model 2026.03		1.1
SPICE 2026.03		0.9
SPICE 2026.03		0.8
Base Model 2026.03		0.6
WIST 2026.03		0.6
R-Zero 2026.03		0.4