Data Selection for LLM Alignment Using Fine-Grained Preferences

About

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training. We theoretically analyze the loss-bound optimality of our selection strategy and conduct extensive empirical studies on varied settings and datasets to demonstrate that our practical selection method could achieve consistent improvement against standard full-data alignment, using even just 30% of the data. Our work shares a line that LLM alignment using fine-grained preferences is highly feasible.

Jia Zhang, Yao Liu, Chen-Xi Zhang, Yi Liu, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li• 2025

Related benchmarks

Task	Dataset	Result
LLM Alignment	HelpSteer (test)	AlpacaEval 2 WR8.34	27
LLM Alignment	UltraFeedback (test)	AlpacaEval 2 Win Rate (WR)21	18
LLM Alignment	Taobao Live proprietary fine-grained preference dataset	Win Score1.53	13

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord