Textbooks Are All You Need

About

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio C\'esar Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, S\'ebastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li• 2023

Related benchmarks

Task	Dataset	Result
Code Summarization	CodeXGLUE	Java Score14.6	38
Research Paper Reasoning and Comprehension	ArxivRollBench 2025a (val)	Valid Accuracy4.4	38
Boundary Detection	RoFT-chatgpt GPT-3.5-turbo generated (test)	Accuracy36.5	34
Boundary Detection	RoFT original (test)	Accuracy20.9	27
Code Translation	CodefuseEval	Ja2Py Score73.8	21
Multi-task Code Intelligence (Overall)	CodefuseEval and CodeXGLUE	Overall Score41.4	21
Code Repair	CodeXGLUE	Repair Rate10.1	21
Clone Detection	CodeXGLUE	Clone Score93.5	21
Defect Detection	CodeXGLUE	Defect Rate0.611	21
Emotion Classification	DAIR-AI Emotion	F1 Score91.3	16

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord