Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Alignment Data Map for Efficient Preference Data Selection and Diagnosis

About

Human preference data is essential for aligning large language models (LLMs) with human values, but collecting such data is often costly and inefficient-motivating the need for efficient data selection methods that reduce annotation costs while preserving alignment effectiveness. To address this issue, we propose Alignment Data Map, a data analysis tool for identifying and selecting effective preference data. We first evaluate alignment scores of the preference data by LLM-as-a-judge, explicit reward model, and reference-based approaches. The Alignment Data Map considers both response quality and inter-response variability based on the alignment scores. From our experimental findings, training on only 33% of samples that exhibit high-quality and low-variability, achieves comparable or superior alignment performance on MT-Bench, Evol-Instruct, and AlpacaEval, compared to training with the full dataset. In addition, Alignment Data Map detects potential label misannotations by analyzing correlations between annotated labels and alignment scores, improving annotation accuracy. The implementation is available at https://github.com/01choco/Alignment-Data-Map.

Seohyeong Lee, Eunwon Kim, Hwaran Lee, Buru Chang• 2025

Related benchmarks

TaskDatasetResultRank
Instruction FollowingAlpacaEval
Win Rate4.98
420
Instruction FollowingMT-Bench
MT-Bench Score4.63
287
AlignmentMT-Bench
MT-Bench Score5.21
49
Instruction FollowingEvol-Inst
Win Rate26.8
34
LLM AlignmentAlpacaEval
Win Rate25.24
24
LLM AlignmentEvol-Instruct
Win Rate51.4
24
Multimodal Visual Question AnsweringMMBench
Score72.7
18
Multi-discipline Multimodal Understanding and ReasoningMMMU
Overall Score42.6
6
Showing 8 of 8 rows

Other info

Follow for update