OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection

About

Recent studies have demonstrated the significant potential of Large Language Models (LLMs) in generating Register Transfer Level (RTL) code, with notable advancements showcased by commercial models such as GPT-4 and Claude3-Opus. However, these proprietary LLMs often raise concerns regarding privacy and security. While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets. To address this challenge, we introduce OriGen , a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology for generating high-quality, large-scale RTL code. Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets. Furthermore, OriGen can rectify syntactic errors through a self-reflection process that leverages compiler feedback. Experimental results demonstrate that OriGen significantly outperforms other open-source alternatives in RTL code generation. It surpasses the previous best-performing open-source LLM by 12.8% and even exceeds GPT-4 Turbo in the pass@1 metric on the VerilogEval-Human benchmark. Moreover, OriGen exhibits superior capabilities in self-reflection and error correction, outperforming GPT-4 by 19.9% on a benchmark designed to evaluate self-reflection capabilities.

Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, Yun (Eric) Liang• 2024

Related benchmarks

Task	Dataset	Result
Verilog Code Generation	VerilogEval v1 (Human)	Pass@154.4	54
Verilog Code Generation	VerilogEval Machine	Pass@174.1	37
Verilog Code Generation	VerilogEval SR v2	Pass@149.3	34
Verilog Code Generation	VerilogEval CC v2	Pass@149.3	33
RTL generation	VerilogEval 156 cases (test)	Pass@10.33	32
Verilog Code Generation	RTLLM v1.1	--	31
Verilog Code Generation	RTLLM v2.0	Pass@565.91	17
Verilog Code Generation	VerilogEval Machine v1	Pass@174.1	17
RTL generation	RTLLM 50 cases (test)	Pass@10.34	16
Verilog Code Generation	RTLLM v1	Pass@150.6	16

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord