Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MultiPL-E

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code correctness predictionMultiPL-E Java
AUROC0.705
60
Code Correctness PredictionMultiPL-E Java
Brier Score0.231
60
Code Correctness PredictionMultiPL-E Java
ECE0.075
60
Code GenerationMultiPL-E
Average Score76.5
47
CodingMultiPL-E
Score87.9
31
Code GenerationMultiPL-E
Average Pass@179.5
19
Code GenerationMultiPL-E HumanEval translated from Python
C++ Pass Rate54.6
17
Multilingual Code CompletionMultipl-E
Pass@131.14
12
Multilingual Code GenerationMultipl-E
MultiplE72.84
10
Code GenerationMultiPL-E 2022 (test)
Java44.9
10
Code GenerationMultiPL-E MBPP
Score58.8
9
Code GenerationMultiPL-E Java
Pass@142.07
6
Code GenerationMultiPL-E
Pass@1 (Lua)42
6
Code GenerationMultiPL-E 7 langs
Score (%)26
5
Code GenerationMultiPL-E
Pass@161.1
5
Code SynthesisMultiPL-E
Success Rate (Lua)68
5
Code GenerationMultiPL-E
Accuracy59.6
5
Single line code infillingMultiPL-E
Python SPM Exact Match74.5
5
Code GenerationMultiPL-E v1 (test)
Accuracy59.1
3
Showing 19 of 19 rows