Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code Reasoning on MBPP

84.7MBPP Execution Accuracy

Qwen2.5-Coder-CFB-Aug

22.61238.73154.8570.969Oct 24, 2024Jan 1, 2025Mar 12, 2025May 21, 2025Jul 30, 2025Oct 8, 2025Dec 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2024.10
84.769.3
2024.10
83.169.9
2024.10
77.264
2025.12
73.8-
2025.12
73.2-
2025.12
72.6-
2025.12
72.6-
2025.12
72.1-
2025.12
71.9-
2025.12
71.7-
2025.12
71.6-
2025.12
71.2-
2025.12
71-
2025.12
70.5-
2025.12
70.4-
2025.12
70.3-
2025.12
70-
2025.12
69.6-
2025.12
69.5-
2025.12
69.5-
2025.12
68.9-
2025.12
68.9-
2025.12
68.8-
2025.12
68.5-
2025.12
68.3-
2025.12
67.8-
2025.12
67.1-
2025.12
66.9-
2025.12
66.4-
2025.12
64.7-
2025.12
59.3-
2025.12
58.7-
2025.12
25-