Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code Reasoning on CRUXEval

68.6Accuracy

Qwen2.5-Math-72B-Instruct

26.37637.33848.359.262Feb 18, 2025Apr 9, 2025May 30, 2025Jul 19, 2025Sep 8, 2025Oct 28, 2025Dec 18, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.02
68.6
2025.02
65.2
2025.02
59.6
2025.12
52.9
2025.12
52.6
2025.12
52.5
2025.12
51
2025.02
50.9
2025.12
50.2
2025.02
50
2025.12
48.8
2025.12
48.6
2025.02
48
2025.12
48
2025.12
48
2025.12
46.9
2025.12
45.6
2025.12
44.8
2025.02
40.8
2025.02
35.1
2025.02
28