Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME25 (Accuracy, Average response length)

80.3Accuracy

Qwen3-4B-Thinking

Updated 5mo ago

Evaluation Results

Method	Links
Qwen3-4B-Thinking 2026.01		80.3	23,912
CoD 2026.01		76.7	17,338
Ada-R1 2026.01		68.9	11,969
Task Arithmetic 2026.01		67.8	11,395
RPAM 2026.01		67.8	10,157
TIES Merging 2026.01		60	10,891
AIM 2026.01		60	9,934
Average Merging 2026.01		57.8	10,099
DARE-Linear 2026.01		56.7	12,247
ACM 2026.01		54.4	11,080
Qwen3-4B-Instruct 2026.01		50	7,368