Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MBPP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationMBPP (test)
Pass@195.1
276
Code GenerationMBPP
Pass@187.6
175
Code GenerationMBPP
Accuracy (%)92.2
146
Code GenerationMBPP+
Pass@183.6
122
Code GenerationMBPP
Accuracy79.8
120
Code GenerationMBPP
Pass@191.8
113
Code GenerationMBPP
Accuracy96.6
90
Code GeneratingMBPP
Pass@183.1
88
Code GenerationMBPP Plus (test)
Accuracy83.6
87
Code GenerationMBPP-ET
Pass@191.8
75
Code GenerationMBPP+
Accuracy75.9
75
Code GenerationMBPP Sanitized
Accuracy85.7
51
Function-level Code GenerationMBPP+ augmented (test)
Pass@179.6
45
CodeMBPP
Pass@177.9
43
Code GenerationMBPP+
Score94.2
43
Code GenerationMBPP
Score58
38
CodingMBPP+
Pass@186.21
37
Code GenerationMBPP
MBPP Score66.17
35
Code ReasoningMBPP
MBPP Execution Accuracy84.7
33
Code CompletionMBPP+
Pass@165.6
33
Code GenerationMBPP v1 (test)
Pass@168.9
33
Code GenerationMBPP
Accuracy58
32
Code VerificationMBPP+
Pass@176.93
32
Python CodingMBPP standard (test)
Pass@1 Accuracy85.25
32
CodingMBPP
Accuracy98.4
31
Showing 25 of 137 rows