CodeGemma: Open Code Models Based on Gemma

About

This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and open-ended generation in latency-sensitive settings.

CodeGemma Team: Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, Jingyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec, Kelly Schaefer, Scott Huffman• 2024

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	Pass@154.88	1048
Code Generation	HumanEval+	Pass@141.46	393
Code Generation	MBPP+	Pass@154.76	238
Code Generation	MBPP	Pass@1 Accuracy53.2	59
Code Generation	LiveCodeBench	Pass@18.12	51
Code Completion	APC Hard Completion 1.0 (test)	HCR100	33
Code Completion	APC Placeholder Completion 1.0 (test)	PCR0.00e+0	33
Code Generation	BigCodeBench	pass@125.44	32
Severity classification	Smart Contract Audit Dataset	Precision40.78	20
Vulnerability Detection	Smart Contract Audit Dataset	Precision96.39	20

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord