Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RepoBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code CompletionRepoBench-P LongBench
Pass@152.05
30
Long-context language generationRepoBench-P
Average Acceptance Length4.46
25
Code CompletionRepoBench-P
Similarity0.7305
17
Code GenerationRepoBench
Speedup3.57
12
Long Code CompletionRepoBench >8k
Edit Sim51.24
12
Long Code CompletionRepoBench 4k-8k
Edit Similarity53.3
12
Long Code CompletionRepoBench 0-4k
Edit Similarity52.82
12
Long-context code completionRepoBench-P
MAT1.83
11
Repository-level code-completionRepoBench (test)
Exact-match Accuracy65.9
7
Code GenerationRepoBench-P Python, XF-Random
Execution Match (EM)64.5
6
Code GenerationRepoBench-P Python XF-First
Exact Match (EM)52.4
6
CodingRepoBench
Pass@125.3
6
code generationRepoBench P
Score15.04
5
Code CompletionRepoBench
Pass@k Score48.92
1
Showing 14 of 14 rows