Preference Alignment with Flow Matching

About

We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs like GPT-4. In contrast, PFM utilizes flow matching techniques to directly learn from preference data, thereby reducing the dependency on extensive fine-tuning of pre-trained models. By leveraging flow-based models, PFM transforms less preferred data into preferred outcomes, and effectively aligns model outputs with human preferences without relying on explicit or implicit reward function estimation, thus avoiding common issues like overfitting in reward models. We provide theoretical insights that support our method's alignment with standard PbRL objectives. Experimental results indicate the practical effectiveness of our method, offering a new direction in aligning a pre-trained model to preference. Our code is available at https://github.com/jadehaus/preference-flow-matching.

Minu Kim, Yongsik Lee, Sehyeok Kang, Jihwan Oh, Song Chong, Se-Young Yun• 2024

Related benchmarks

Task	Dataset	Result
Sentiment Review Generation	Sentiment Review 100 instances (test)	Avg Preference Score2.7894	6
Delay-robust robot control	Kinetix	Success Rate (Avg d=0-7)78.4	6
Offline Reinforcement Learning	D4RL MuJoCo v2	Ant Return (Random)31.62	4
Sentiment Review Generation	Sentiment review generation 100 samples (test)	Win Rate vs π_PPO100	4

Showing 4 of 4 rows

Other info

Code

Follow for update

@wizwand_team Discord