Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code Reasoning on HumanEval

95.73HumanEval Score

DeepSeek-R1-Distill-Qwen-14B (Reasoning)

-3.194822.487648.1773.8524Oct 24, 2024Jan 5, 2025Mar 20, 2025Jun 2, 2025Aug 14, 2025Oct 27, 2025Jan 9, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
95.73-
2026.01
92.41-
2026.01
92.32-
2026.01
91.46-
2026.01
89.57-
2026.01
89.02-
2026.01
88.41-
2024.10
86.669.9
2026.01
86.59-
2026.01
84.31-
2026.01
84.15-
2024.10
84.169.3
2026.01
82.32-
2026.01
82.32-
2025.12
79.9-
2024.10
79.364
2026.01
78.8-
2025.12
75-
2026.01
74.39-
2025.12
73.2-
2025.12
70.1-
2026.01
67.66-
2026.01
64.63-
2025.12
64-
2026.01
61.59-
2026.01
61.59-
2026.01
59.15-
2026.01
42.94-
2026.01
40.85-
2026.01
38.66-
2026.01
34.15-
2026.01
19.51-
2026.01
19.51-
2026.01
3.66-
2026.01
0.61-