Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Reasoning on MBPP

84.7MBPP Execution Accuracy

Qwen2.5-Coder-CFB-Aug

22.61238.73154.8570.969Oct 24, 2024Jan 1, 2025Mar 12, 2025May 21, 2025Jul 30, 2025Oct 8, 2025Dec 17, 2025
Updated 8d ago

Evaluation Results

MethodLinks
2024.10
84.769.3--
2024.10
83.169.9--
2024.10
77.264--
2025.12
73.8---
2025.12
73.2---
2025.12
72.6---
2025.12
72.6---
2025.12
72.1---
2025.12
71.9---
2025.12
71.7---
2025.12
71.6---
2025.12
71.2---
2025.12
71---
2025.12
70.5---
2025.12
70.4---
2025.12
70.3---
2025.12
70---
2025.12
69.6---
2025.12
69.5---
2025.12
69.5---
2025.12
68.9---
2025.12
68.9---
2025.12
68.8---
2025.12
68.5---
2025.12
68.3---
2025.12
67.8---
2025.12
67.1---
2025.12
66.9---
2025.12
66.4---
2025.12
64.7---
2025.12
59.3---
2025.12
58.7---
2025.12
25---
2026.05
--6166.7
2026.05
--63.266.7
2026.05
--62.265.7
2026.05
--62.664.6
2026.05
--6466.67
2026.05
--6465.66
2026.05
--66.266.67
2026.05
--62.367.68
2026.05
--6850.5
2026.05
--66.832.32
2026.05
--38.227.27
2026.05
--56.312.12
2026.05
--5967.7
2026.05
--60.868.7
2026.05
--58.267.7
2026.05
--61.768.7