Zephyr: Direct Distillation of LM Alignment

About

We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access RLHF-based model. Code, models, data, and tutorials for the system are available at https://github.com/huggingface/alignment-handbook.

Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Cl\'ementine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf• 2023

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy82.79	1896
Commonsense Reasoning	WinoGrande	Accuracy74.19	1442
Mathematical Reasoning	GSM8K	Accuracy61.63	1398
Code Generation	HumanEval	--	1043
Question Answering	ARC Challenge	Accuracy57.6	906
Multi-task Language Understanding	MMLU	Accuracy58.9	881
Language Understanding	MMLU	Accuracy56.9	844
Instruction Following	IFEval	IFEval Accuracy35.3	836
Reasoning	BBH	--	726
Instruction Following	AlpacaEval 2.0	Win Rate16.5	722

Showing 10 of 86 rows

...

Other info

Code

Follow for update

@wizwand_team Discord