Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

About

Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-improvement. To this end, we propose FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus. Specifically, FunCoder recursively branches off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives. Additionally, we designate functions via a consensus formed by identifying similarities in program behavior, mitigating error propagation. FunCoder outperforms state-of-the-art methods by +9.8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3.5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3.5 by +18.6% and achieves 97.7% of GPT-4's performance on HumanEval. Further analysis reveals that our proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.

Jingchang Chen, Hongxuan Tang, Zheng Chu, Qianglong Chen, Zekun Wang, Ming Liu, Bing Qin• 2024

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval (test)
Pass@194.5
444
Mathematical ReasoningMATH (test)
Overall Accuracy78.2
433
Code GenerationMBPP sample 200
Pass@179.5
18
Code GenerationxCodeEval (sample 500)
Accuracy (Easy)0.831
17
Showing 4 of 4 rows

Other info

Code

Follow for update