Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Textual Mathematical Reasoning on AIME25 (test)

86.1Mean@5

Naive-KD

66.54871.62476.781.776Mar 25, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.03
86.1
2026.03
84.6
2026.03
81.5
2026.03
79.2
2026.03
73.3
2026.03
67.3