CoRoVA: Compressed Representations for Vector-Augmented Code Completion
About
Retrieval-augmented generation has emerged as one of the most effective approaches for code completion enhancement, especially when repository-level context is important. However, adding this extra retrieved context significantly increases sequence length, raises prefill cost, and degrades time-to-first-token (TTFT), which slows down inference -- a critical limitation for interactive settings such as IDEs. In this work, we introduce CoRoVA, a framework that compresses context into compact, semantically rich representations that remain interpretable to code LLMs. This improves generation quality while reducing prompt augmentation to only a few compressed single-token vectors. Our approach requires training only a small projector module and introduces negligible additional latency, yet it significantly improves the prediction quality of code LLMs. Our experiments show that CoRoVA enables a 20-38\% reduction in TTFT on completion tasks compared to uncompressed RAG.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Function Completion | RepoEval amazon-science patchcore-inspection | Pass@566.94 | 3 | |
| Function Completion | RepoEval leopard (ai-betty) | Pass@564.99 | 3 | |
| Function Completion | RepoEval deepmind tracr | Pass@552.76 | 3 |