Alignment Data Map for Efficient Preference Data Selection and Diagnosis

About

Human preference data is essential for aligning large language models (LLMs) with human values, but collecting such data is often costly and inefficient-motivating the need for efficient data selection methods that reduce annotation costs while preserving alignment effectiveness. To address this issue, we propose Alignment Data Map, a data analysis tool for identifying and selecting effective preference data. We first evaluate alignment scores of the preference data by LLM-as-a-judge, explicit reward model, and reference-based approaches. The Alignment Data Map considers both response quality and inter-response variability based on the alignment scores. From our experimental findings, training on only 33% of samples that exhibit high-quality and low-variability, achieves comparable or superior alignment performance on MT-Bench, Evol-Instruct, and AlpacaEval, compared to training with the full dataset. In addition, Alignment Data Map detects potential label misannotations by analyzing correlations between annotated labels and alignment scores, improving annotation accuracy. The implementation is available at https://github.com/01choco/Alignment-Data-Map.

Seohyeong Lee, Eunwon Kim, Hwaran Lee, Buru Chang• 2025

Related benchmarks

Task	Dataset	Result
Instruction Following	AlpacaEval	Win Rate4.98	423
Instruction Following	MT-Bench	MT-Bench Score4.63	287
Alignment	MT-Bench	MT-Bench Score5.21	49
Instruction Following	Evol-Inst	Win Rate26.8	34
LLM Alignment	AlpacaEval	Win Rate25.24	24
LLM Alignment	Evol-Instruct	Win Rate51.4	24
Multimodal Visual Question Answering	MMBench	Score72.7	18
Multi-discipline Multimodal Understanding and Reasoning	MMMU	Overall Score42.6	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord