DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

About

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DExperts: Decoding-time Experts, a decoding-time method for controlled text generation that combines a pretrained language model with "expert" LMs and/or "anti-expert" LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if they are considered likely by the experts, and unlikely by the anti-experts. We apply DExperts to language detoxification and sentiment-controlled generation, where we outperform existing controllable generation methods on both automatic and human evaluations. Moreover, because DExperts operates only on the output of the pretrained LM, it is effective with (anti-)experts of smaller size, including when operating on GPT-3. Our work highlights the promise of tuning small LMs on text with (un)desirable attributes for efficient decoding-time steering.

Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi• 2021

Related benchmarks

Task	Dataset	Result
Math Reasoning	GSM8K	Accuracy50.5	254
Safety Evaluation	HarmBench	ASR29	148
Instruction Following	AlpacaEval 2.0 (test)	LC Win Rate (%)16.58	95
Language model detoxification	RealToxicityPrompts (test)	Distinct-158	54
Toxicity Mitigation	RealToxicityPrompts challenging	Avg Toxicity (Max)52.7	46
Helpfulness alignment	HHH Alignment	Win Rate (WR)55.4	44
Detoxification	AttaQ benchmark	Avg Toxicity (Max)0.165	32
Detoxification	RealToxicityPrompts challenging	Max Toxicity0.527	32
Sentiment Steering	OpenWebText Neutral to Negative (test)	Perplexity (PPL)32.86	27
Sentiment Steering	OpenWebText Neutral to Positive (test)	Perplexity (PPL)30.52	27

Showing 10 of 49 rows

Other info

Code

Follow for update

@wizwand_team Discord