Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Reasoning on CRUXEval

68.6Accuracy

Qwen2.5-Math-72B-Instruct

26.37637.33848.359.262Feb 18, 2025Apr 9, 2025May 30, 2025Jul 19, 2025Sep 8, 2025Oct 28, 2025Dec 18, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
68.6
2025.02
65.2
2025.02
59.6
2025.12
52.9
2025.12
52.6
2025.12
52.5
2025.12
51
2025.02
50.9
2025.12
50.2
2025.02
50
2025.12
48.8
2025.12
48.6
2025.02
48
2025.12
48
2025.12
48
2025.12
46.9
2025.12
45.6
2025.12
44.8
2025.02
40.8
2025.02
35.1
2025.02
28