Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2025 (held-out)

66.7DBS

DataChef-32B

Updated 4mo ago

Evaluation Results

Method	Links
DataChef-32B 2026.02		66.7	-
SOURCEbest 2026.02		39.6	-
EXPERT 2026.02		33.3	-
Gemini-3-Pro 2026.02		30	80.7
DataChef-32B 2026.02		30	84.7
SOURCEavg 2026.02		23.4	-
Qwen3-Next ⊕ Kimi-K2 2026.02		23.3	78.3
Kimi-K2 2026.02		20	35.4
Qwen3-32B 2026.02		13.3	31.5