Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

About

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90.9\%}$ vs. $\mathbf{90.2\%}$). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset. The code and the demo video is available in \url{https://github.com/bin123apple/AutoCoder}.

Bin Lei, Yuchen Li, Qiuwu Chen• 2024

Related benchmarks

TaskDatasetResultRank
Code GenerationBIRD-Python Verified
Execution Accuracy (Simple)0.0865
14
Code GenerationBIRD-Python Original (dev)
Execution Accuracy (Simple)0.075
14
SQL GenerationBIRD Verified
Execution Accuracy (Simple)13.84
14
SQL GenerationBIRD Original (dev)
Execution Accuracy (Simple)12.54
14
Showing 4 of 4 rows

Other info

Follow for update