BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

About

This paper concerns the problem of aligning samples from large language models to human preferences using best-of-$n$ sampling, where we draw $n$ samples, rank them, and return the best one. We consider two fundamental problems. First: what is the relationship between best-of-$n$ and approaches to alignment that train LLMs to output samples with a high expected reward (e.g., RLHF or DPO)? To answer this, we embed both the best-of-$n$ distribution and the sampling distributions learned by alignment procedures in a common class of tiltings of the base LLM distribution. We then show that, within this class, best-of-$n$ is essentially optimal in terms of the trade-off between win-rate against the base model vs KL distance from the base model. That is, best-of-$n$ is the best choice of alignment distribution if the goal is to maximize win rate. However, best-of-$n$ requires drawing $n$ samples for each inference, a substantial cost. To avoid this, the second problem we consider is how to fine-tune a LLM to mimic the best-of-$n$ sampling distribution. We derive BoNBoN Alignment to achieve this by exploiting the special structure of the best-of-$n$ distribution. Experiments show that BoNBoN alignment yields substantial improvements in producing a model that is preferred to the base policy while minimally affecting off-target aspects.

Lin Gui, Cristina G\^arbacea, Victor Veitch• 2024

Related benchmarks

Task	Dataset	Result
Code Generation	MBPP+	Accuracy73	243
Code Generation	HumanEval	Accuracy87.9	224
Mathematical Reasoning	Olympiad	Accuracy0.556	136
Mathematical Reasoning	Mathematical Reasoning Suite (Math500, Gaokao En, Olympiad, GSM8K, AMC23, AIME25, AIME24)	Math500 Score87.4	18

Showing 4 of 4 rows

Other info

Code

Follow for update

@wizwand_team Discord