RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
About
The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model in an iterative retrieval-generation pipeline. RepoCoder makes effective utilization of repository-level information for code completion and has the ability to generate code at various levels of granularity. Moreover, we propose a new benchmark RepoEval, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. Experimental results indicate that RepoCoder significantly improves the In-File completion baseline by over 10% in all settings and consistently outperforms the vanilla retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research. Our source code and benchmark are publicly available: https://github.com/microsoft/CodeT/tree/main/RepoCoder
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| API Invocation Completion | RepoEval 1.0 (test) | Exact Match49.56 | 24 | |
| Line Completion | RepoEval 1.0 (test) | Exact Match57 | 24 | |
| Repo-level Method Body Completion | RAMBO's benchmark (test) | BLEU50.04 | 21 | |
| Repository-level Method Body Completion | Defect4J | BLEU63.52 | 21 | |
| Code Generation | CoderEval-Python class-runnable | Pass@135.45 | 16 | |
| Code Generation | CoderEval-Python file-runnable | Pass@129.41 | 8 | |
| Repo-level Method Body Completion | Repo-Eval Python (test) | BLEU52.71 | 7 | |
| Function Body Completion | RepoEval Function Body Completion (All) | Pass Rate42.63 | 6 | |
| Code Completion | CrossCodeEval Project-level (test) | C-EM8.52 | 4 |