InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning

About

Supervised fine-tuning (SFT) is fundamental to adapting large language models, yet training on complete datasets incurs prohibitive costs with diminishing returns. Existing data selection methods suffer from severe domain specificity: techniques optimized for general instruction-following fail on reasoning tasks, and vice versa. We observe that measuring entropy differences between base models and minimally instruction-tuned calibrated models reveals a pattern -- samples with the lowest differential entropy consistently yield optimal performance across domains, yet this principle manifests domain-adaptively: reasoning tasks favor entropy increase (cognitive expansion), while general tasks favor entropy decrease (cognitive compression). We introduce InstructDiff, a unified framework that operationalizes differential entropy as a domain-adaptive selection criterion through warmup calibration, bi-directional NLL filtering, and entropy-based ranking. Extensive experiments show that InstructDiff achieves 17\% relative improvement over full data training on mathematical reasoning and 52\% for general instruction-following, outperforming prior baselines while using only 10\% of the data.

Junyou Su, He Zhu, Xiao Luo, Liyu Zhang, Hong-Yu Zhou, Yun Chen, Peng Li, Yang Liu, Guanhua Chen• 2026

Related benchmarks

Task	Dataset	Result
Medical Knowledge Question Answering	Medical Domain (MedQA, MMLU, MedMCQA) (test)	MedQA Score54.67	45
Instruction Following	General Domain AlpacaEval Arena-Hard LLaMA3-8B (10% selection)	AlpacaEval Score12.09	18
Math problem solving	Math Domain (AIME24, Math-OAI, Minerva, Olympiad, ACM23) Qwen2.5-7B (10% selection)	AIME24 Score7.71	18
Code Generation	Code Domain HumanEval, HumanEval+, MBPP, MBPP+, Bigcode (test)	HumanEval48.2	18

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord