Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NEFTune: Noisy Embeddings Improve Instruction Finetuning

About

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune.

Neel Jain, Ping-yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein• 2023

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval--
850
Language UnderstandingMMLU
Accuracy49.8
756
Multi-turn Dialogue EvaluationMT-Bench
Overall Score5.05
331
Physical Commonsense ReasoningPIQA
Accuracy67.6
329
Instruction FollowingIFEval
Accuracy (0-100)42.7
292
Science Question AnsweringARC-C
Accuracy55.9
127
Code GenerationMBPP
Accuracy29.2
120
Open-ended generationAlpacaEval 2.0
Win Rate287
43
General Natural Language Processing18 Canonical NLP Tasks
Understanding & Knowledge65.9
23
Open-ended generationAlpacaEval 1.0
Win Rate3.98e+3
23
Showing 10 of 14 rows

Other info

Follow for update