Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model

About

While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.

Haikang Deng, Colin Raffel• 2023

Related benchmarks

TaskDatasetResultRank
Toxicity MitigationRealToxicityPrompts challenging
Avg Toxicity (Max)6.2
46
DetoxificationRealToxicityPrompts challenging
Max Toxicity0.062
32
DetoxificationAttaQ benchmark
Avg Toxicity (Max)0.045
32
DetoxificationBoLD
Toxicity (Max)1.9
28
Toxicity EvaluationBOLD 23679 prompts (test)
Avg Toxicity (Max)0.031
18
Toxicity EvaluationAttaQ 1402 prompts (test)
Max Toxicity Score0.042
14
Toxicity EvaluationBoLD
Avg Toxicity (Max)0.022
14
Toxicity EvaluationAttaQ
Max Toxicity Score0.04
14
Showing 8 of 8 rows

Other info

Follow for update