Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RepoBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context language generationRepoBench-P
Average Acceptance Length4.46
25
Long Code CompletionRepoBench >8k
Edit Sim51.24
12
Long Code CompletionRepoBench 4k-8k
Edit Similarity53.3
12
Long Code CompletionRepoBench 0-4k
Edit Similarity52.82
12
Code CompletionRepoBench-P
Similarity0.7305
10
Repository-level code-completionRepoBench (test)
Exact-match Accuracy65.9
7
CodingRepoBench
Pass@125.3
6
code generationRepoBench P
Score15.04
5
Code CompletionRepoBench
Pass@k Score48.92
1
Showing 9 of 9 rows