Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DynamixSFT: Dynamic Mixture Optimization of Instruction Tuning Collections

About

As numerous instruction-tuning datasets continue to emerge during the post-training stage, dynamically balancing and optimizing their mixtures has become a critical challenge. To address this, we propose DynamixSFT, a dynamic and automated method for instruction-tuning dataset mixture optimization. We formulate the problem as a multi-armed bandit setup and introduce a Prior-scaled Boltzmann Exploration that softly anchors the updated sampling distribution to the original dataset proportions, thereby preserving the inherent diversity and coverage of the collection. Sampling probabilities are updated using a lightweight 1-Step Look-ahead Reward, reflecting how much the dataset contributes to improving the model's performance at its current state. When applied to the Tulu-v2-mixture collection comprising 16 instruction-tuning datasets, DynamixSFT achieves up to a 2.2% performance improvement across 10 benchmarks. Furthermore, we provide a comprehensive analysis and visualizations to offer deeper insights into the adaptive dynamics of our method.

Haebin Shin, Lei Ji, Xiao Liu, Zhiwei Yu, Qi Chen, Yeyun Gong• 2025

Related benchmarks

TaskDatasetResultRank
Science and Knowledge Question AnsweringScience and Knowledge Aggregate: SciQ, ARC-Easy, MedMCQA (test)
Accuracy75.2
42
General Language Understanding10 Benchmarks Average (test)
Accuracy (Average)62.1
15
Commonsense and Language ReasoningCommonsense and Language Aggregate: HellaSwag, Winogrande, BoolQ (test)
Accuracy69
6
Mathematical and Quantitative ReasoningMathematic and Quantitative (test)
Accuracy47.9
6
Showing 4 of 4 rows

Other info

Follow for update