Alignment Data Map for Efficient Preference Data Selection and Diagnosis
About
Human preference data is essential for aligning large language models (LLMs) with human values, but collecting such data is often costly and inefficient-motivating the need for efficient data selection methods that reduce annotation costs while preserving alignment effectiveness. To address this issue, we propose Alignment Data Map, a data analysis tool for identifying and selecting effective preference data. We first evaluate alignment scores of the preference data by LLM-as-a-judge, explicit reward model, and reference-based approaches. The Alignment Data Map considers both response quality and inter-response variability based on the alignment scores. From our experimental findings, training on only 33% of samples that exhibit high-quality and low-variability, achieves comparable or superior alignment performance on MT-Bench, Evol-Instruct, and AlpacaEval, compared to training with the full dataset. In addition, Alignment Data Map detects potential label misannotations by analyzing correlations between annotated labels and alignment scores, improving annotation accuracy. The implementation is available at https://github.com/01choco/Alignment-Data-Map.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | AlpacaEval | Win Rate4.98 | 420 | |
| Instruction Following | MT-Bench | MT-Bench Score4.63 | 287 | |
| Alignment | MT-Bench | MT-Bench Score5.21 | 49 | |
| Instruction Following | Evol-Inst | Win Rate26.8 | 34 | |
| LLM Alignment | AlpacaEval | Win Rate25.24 | 24 | |
| LLM Alignment | Evol-Instruct | Win Rate51.4 | 24 | |
| Multimodal Visual Question Answering | MMBench | Score72.7 | 18 | |
| Multi-discipline Multimodal Understanding and Reasoning | MMMU | Overall Score42.6 | 6 |