| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context language generation | RepoBench-P | Average Acceptance Length4.46 | 25 | |
| Long Code Completion | RepoBench >8k | Edit Sim51.24 | 12 | |
| Long Code Completion | RepoBench 4k-8k | Edit Similarity53.3 | 12 | |
| Long Code Completion | RepoBench 0-4k | Edit Similarity52.82 | 12 | |
| Code Completion | RepoBench-P | Similarity0.7305 | 10 | |
| Repository-level code-completion | RepoBench (test) | Exact-match Accuracy65.9 | 7 | |
| Coding | RepoBench | Pass@125.3 | 6 | |
| code generation | RepoBench P | Score15.04 | 5 | |
| Code Completion | RepoBench | Pass@k Score48.92 | 1 |