DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections
About
As numerous instruction-tuning datasets continue to emerge during the post-training stage, dynamically balancing and optimizing their mixtures has become a critical challenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model's performance at its current state. When applied to the Tulu-v2-mixture collection comprising 16 instruction-tuning datasets, DynamixSFT achieves up to a 2.2% performance improvement across 10 benchmarks. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Science and Knowledge Question Answering | Science and Knowledge Aggregate: SciQ, ARC-Easy, MedMCQA (test) | Accuracy75.2 | 42 | |
| General Language Understanding | 10 Benchmarks Average (test) | Accuracy (Average)62.1 | 15 | |
| Commonsense and Language Reasoning | Commonsense and Language Aggregate: HellaSwag, Winogrande, BoolQ (test) | Accuracy69 | 6 | |
| Mathematical and Quantitative Reasoning | Mathematic and Quantitative (test) | Accuracy47.9 | 6 |