Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unified Pre-training for Program Understanding and Generation

About

Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART's effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang• 2021

Related benchmarks

TaskDatasetResultRank
Code SummarizationCodeXGLUE
Java Score18.45
38
Code SummarizationCodeSearchNet-Java (CSN) CodeXGLUE (test)
Smoothed BLEU-418.45
38
Clone DetectionPOJ-104 CodeXGLUE (test)
MAP@R86.27
17
Code GenerationConcode CodeXGLUE (test)
EM18.6
14
Code InfillingHumanEval multi-line code infilling
Pass Rate13.1
12
Code InfillingHumanEval single-line infilling (test)
Pass Rate0.416
12
Clone DetectionBigCloneBench CodeXGLUE (test)
F1 Score97.2
11
Code SummarizationCodeXGLUE Python (test)
BLEU-419.3
11
Code SummarizationCodeXGLUE Java (test)
BLEU-418.45
11
Docstring GenerationCodeXGLUE Python (test)
BLEU19.3
11
Showing 10 of 35 rows

Other info

Follow for update