Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AutoMixer: Checkpoint Artifacts as Automatic Data Mixers

About

In language model training, it is desirable to equip models with capabilities from various tasks. However, it is not clear how to directly obtain the right data mixtures for these capabilities as the relationship between data and tasks is difficult to be modeled. In this work, we observe that checkpoint models exhibit emerging capabilities at different points in the training trajectory. Often, the training process saves checkpoints as artifacts that are under-utilized as a source of in-training data signals. We identify these artifact models based on their respective capabilities on the benchmarks and leverage them as data mixers by using their aggregated first-order influence approximation over source data. We demonstrated on eight reasoning benchmarks that the proposed framework shows significant improvements in the pretraining setting, with performance improvements of up to 1.93%. Overall, this shows the potential of checkpoint models to enhance data quality and optimize data mixtures.

Ernie Chang, Yang Li, Patrick Huber, Vish Vogeti, David Kant, Yangyang Shi, Vikas Chandra• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag--
1460
ReasoningARC Easy--
183
Question AnsweringBoolQ
Delta Accuracy2.16
15
Question AnsweringOBQA
Accuracy Improvement2.01
12
ReasoningPIQA
Accuracy Improvement2.05
12
ReasoningSIQA
Accuracy Improvement2.12
12
ReasoningWinoGrande
Accuracy Improvement2.14
12
ReasoningARC Hard
Accuracy Improvement0.55
12
Showing 8 of 8 rows

Other info

Follow for update