Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning

About

Supervised fine-tuning (SFT) is fundamental to adapting large language models, yet training on complete datasets incurs prohibitive costs with diminishing returns. Existing data selection methods suffer from severe domain specificity: techniques optimized for general instruction-following fail on reasoning tasks, and vice versa. We observe that measuring entropy differences between base models and minimally instruction-tuned calibrated models reveals a pattern -- samples with the lowest differential entropy consistently yield optimal performance across domains, yet this principle manifests domain-adaptively: reasoning tasks favor entropy increase (cognitive expansion), while general tasks favor entropy decrease (cognitive compression). We introduce InstructDiff, a unified framework that operationalizes differential entropy as a domain-adaptive selection criterion through warmup calibration, bi-directional NLL filtering, and entropy-based ranking. Extensive experiments show that InstructDiff achieves 17\% relative improvement over full data training on mathematical reasoning and 52\% for general instruction-following, outperforming prior baselines while using only 10\% of the data.

Junyou Su, He Zhu, Xiao Luo, Liyu Zhang, Hong-Yu Zhou, Yun Chen, Peng Li, Yang Liu, Guanhua Chen• 2026

Related benchmarks

TaskDatasetResultRank
Medical Knowledge Question AnsweringMedical Domain (MedQA, MMLU, MedMCQA) (test)
MedQA Score54.67
45
Instruction FollowingGeneral Domain AlpacaEval Arena-Hard LLaMA3-8B (10% selection)
AlpacaEval Score12.09
18
Math problem solvingMath Domain (AIME24, Math-OAI, Minerva, Olympiad, ACM23) Qwen2.5-7B (10% selection)
AIME24 Score7.71
18
Code GenerationCode Domain HumanEval, HumanEval+, MBPP, MBPP+, Bigcode (test)
HumanEval48.2
18
Showing 4 of 4 rows

Other info

Follow for update