Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BigCodeBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationBigCodeBench
Accuracy83.84
73
Code GenerationBigCodeBench-Instruct Hard
Pass@128.4
48
Code GenerationBigCodeBench-Instruct (Full)
Pass@10.504
48
Code GenerationBigCodeBench-Completion Full
pass@159.7
41
Code GenerationBigCodeBench Hard
Pass@135.1
38
Code GenerationBigCodeBench Full
Pass@154.2
38
Code GenerationBigCodeBench-Completion Hard
pass@136.5
38
Code Safety EvaluationBigCodeBench 1.0 (test)
Safety Score99.9
24
Code EvaluationBigCodeBench
Accuracy82.02
23
Code GenerationBigCodeBench Lite-Pro Compositional Stream
Accuracy66.7
20
Code GenerationBigCodeBench
Mean Accuracy46.7
20
Code CompletionBigCodeBench Hard
Pass@116.2
20
Code CompletionBigCodeBench Full
Pass@146.1
20
Code GenerationBigCodeBench
pass@188.5
18
Code GenerationBigCodeBench
pass@141.44
18
Code CompletionBigCodeBench
Full Score45.8
17
Code GenerationBigCodeBench Lite-Pro Naive Stream
Accuracy44.8
16
CodingBigCodeBench Hard
Pass@133.8
15
CodingBigCodeBench Full
pass@154
15
Code GenerationBigCodeBench (BCB) 342 tasks 30% held-out (unseen)
Success Rate (SR)55.8
15
Code GenerationBigCodeBench instruct
Full Score0.41
14
Code GenerationBigCodeBench
avg@3252.46
12
Skill retrievalBigCodeBench
Recall@128.2
11
Skill retrievalBigCodeBench
nDCG@173.2
11
Code GenerationBigCodeBench-I Hard
Score28.4
11
Showing 25 of 50 rows