Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code Reasoning on MBPP (Accuracy, Token Cost)

67.3Accuracy

Denser

16.96430.03243.156.168Sep 30, 2025Oct 13, 2025Oct 26, 2025Nov 8, 2025Nov 21, 2025Dec 4, 2025Dec 17, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
67.3-58.7
2025.12
66.1142.8
2025.12
65.8156.3
2025.12
64.9127.6
2025.12
64.5345.7
2025.12
64.2-53.5
2025.12
63.4289.4
2025.12
62.7187.2
2025.12
61.80
2025.12
60.1-54.2
2025.12
58.5-50.8
2025.09
38.8-
2025.09
33.4-
2025.09
32.2-
2025.09
31.5-
2025.09
30.8-
2025.09
26.3-
2025.09
24.7-
2025.09
24.4-
2025.09
24-
2025.09
23.7-
2025.09
19.3-
2025.09
18.9-