Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Reasoning on MBPP (Accuracy, Token Cost)

67.3Accuracy

Denser

16.96430.03243.156.168Sep 30, 2025Oct 13, 2025Oct 26, 2025Nov 8, 2025Nov 21, 2025Dec 4, 2025Dec 17, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
67.3-58.7
2025.12
66.1142.8
2025.12
65.8156.3
2025.12
64.9127.6
2025.12
64.5345.7
2025.12
64.2-53.5
2025.12
63.4289.4
2025.12
62.7187.2
2025.12
61.80
2025.12
60.1-54.2
2025.12
58.5-50.8
2025.09
38.8-
2025.09
33.4-
2025.09
32.2-
2025.09
31.5-
2025.09
30.8-
2025.09
26.3-
2025.09
24.7-
2025.09
24.4-
2025.09
24-
2025.09
23.7-
2025.09
19.3-
2025.09
18.9-