Retrieval Augmented Code Generation and Summarization

About

Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.

Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang• 2021

Related benchmarks

Task	Dataset	Result
Code Generation	Concode CodeXGLUE (test)	EM23.4	14
Code Summarization	CodeXGLUE Python (test)	BLEU-421.01	11
Code Summarization	CodeXGLUE Java (test)	BLEU-422.95	11
Code Generation	CodeXGLUE Python (test)	CodeBLEU18.31	10
Code Generation	Java deduplicated retrieval codebase (test)	EM10.21	9
Code Generation	Python deduplicated retrieval codebase (test)	EM961	9
Code Comment Generation	Python (test)	BLEU21.01	8
Code Comment Generation	Java (test)	BLEU Score22.94	8

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord