SLiC-HF: Sequence Likelihood Calibration with Human Feedback

About

Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.

Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu• 2023

Related benchmarks

Task	Dataset	Result
Instruction Following	AlpacaEval 2.0	Win Rate32.5	722
Instruction Following	AlpacaEval	Win Rate37.03	420
Instruction Following	Arena Hard	Win Rate42.76	263
Instruction Following	AlpacaEval 2	LC (%)26.9	137
Multi-turn Instruction Following	MT-Bench	MT-Bench Score (GPT-4)8.1	129
LLM Alignment Evaluation	AlpacaEval 2	LC Win Rate36.7	89
Multi-turn conversation	MT-Bench	--	41
Language Model Alignment Evaluation	Arena Hard v0.1	Win Rate (%)25.1	36
Dialogue Generation	Anthropic HH (test)	Average Preference Score61.62	16
Conversational Evaluation	AlpacaEval 2.0	LC15.06	14

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord