NEFTune: Noisy Embeddings Improve Instruction Finetuning

About

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune.

Neel Jain, Ping-yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein• 2023

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	--	1043
Language Understanding	MMLU	Accuracy49.8	844
Instruction Following	IFEval	--	836
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy80.6	711
Physical Commonsense Reasoning	PIQA	Accuracy67.6	696
Multi-turn Dialogue Evaluation	MT-Bench	Overall Score5.05	532
Instruction Following	AlpacaEval	Win Rate67.6	420
Science Question Answering	ARC-C	Accuracy55.9	261
Reasoning	ARC	Accuracy56.1	245
Code Generation	MBPP	Accuracy29.2	165

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord