Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

About

The alignment of large language models (LLMs) often assumes that using more clean data yields better outcomes, overlooking the match between model capacity and example difficulty. Challenging this, we propose a new principle: Preference data vary in difficulty, and overly difficult examples hinder alignment, by exceeding the model's capacity. Through systematic experimentation, we validate this principle with three key findings: (1) preference examples vary in difficulty, as evidenced by consistent learning orders across alignment runs; (2) overly difficult examples significantly degrade performance across four LLMs and two datasets; and (3) the capacity of a model dictates its threshold for handling difficult examples, underscoring a critical relationship between data selection and model capacity. Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the AlpacaEval 2 benchmark compared to the DPO baseline, suppressing a series of DPO variants with different algorithmic adjustments. Together, these results illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorgao/SelectiveDPO.

Chengqian Gao, Haonan Li, Liu Liu, Zeke Xie, Peilin Zhao, Zhiqiang Xu• 2025

Related benchmarks

Task	Dataset	Result
Instruction Following	AlpacaEval 2.0	Win Rate1.12	722
Reward Modeling	RewardBench	Chat Score86.54	216
Instruction Following	AlpacaEval 2.0 (test)	LC Win Rate (%)3.25	95
Preference Evaluation	AlpacaEval 2	WR (%)559	48
Reward Modeling	RewardBench (full)	Chat Score78.6	41
Multi-turn Chat Evaluation	MT-Bench	MT-Bench Score7.74	20
General Chat Evaluation	Arena Hard	Win Rate60.4	16
Instruction Following Evaluation	AlpacaEval 2	Win Rate38.02	16
Downstream Task Evaluation	OpenLLM Leaderboard v1 (test)	MMLU (5-shot)63.95	14
Direct Preference Optimization	SHP AlpacaEval 2.0	LCWR16.69	14

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord