DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

About

As numerous instruction-tuning datasets continue to emerge, dynamically balancing and optimizing their mixtures has become a critical challenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model's performance at its current state. We demonstrate that DynamixSFT effectively optimizes the Tulu-2-mixture and Tulu-3-mixture collections across 10 benchmarks, while introducing minimal computational overhead over naive sampling. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.

Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Hyunwoo Yoo, Qi Chen, Yeyun Gong• 2025

Related benchmarks

Task	Dataset	Result
Science and Knowledge Question Answering	Science and Knowledge Aggregate: SciQ, ARC-Easy, MedMCQA (test)	Accuracy75.2	42
Multi-task Language Model Evaluation	TÜLU Evaluation Suite (MMLU, TQA, PopQA, BBH, DROP, CHE, CHE+, GSM8K, MATH, IFEval) 2/3	MMLU Accuracy61.87	24
General Language Understanding	10 Benchmarks Average (test)	Accuracy (Average)62.1	15
Commonsense and Language Reasoning	Commonsense and Language Aggregate: HellaSwag, Winogrande, BoolQ (test)	Accuracy69	6
Mathematical and Quantitative Reasoning	Mathematic and Quantitative (test)	Accuracy47.9	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord