Batch Loss Score for Dynamic Data Pruning
About
Dynamic data pruning accelerates deep learning by selectively omitting less informative samples during training. While per-sample loss is a common importance metric, obtaining it can be challenging or infeasible for complex models or loss functions, often requiring significant implementation effort. This work proposes the Batch Loss Score (BLS), a computationally efficient alternative using an Exponential Moving Average (EMA) of readily available batch losses to assign scores to individual samples. We frame the batch loss, from the perspective of a single sample, as a noisy measurement of its scaled individual loss, with noise originating from stochastic batch composition. It is formally shown that the EMA mechanism functions as a first-order low-pass filter, attenuating high-frequency batch composition noise. This yields a score approximating the smoothed and persistent contribution of the individual sample to the loss, providing a theoretical grounding for BLS as a proxy for sample importance. BLS demonstrates remarkable code integration simplicity (\textbf{three-line injection}) and readily adapts existing per-sample loss-based methods (\textbf{one-line proxy}). Its effectiveness is demonstrated by enhancing two such methods to losslessly prune \textbf{20\%-50\%} of samples across \textit{14 datasets}, \textit{11 tasks} and \textit{18 models}, highlighting its utility and broad applicability, especially for complex scenarios where per-sample loss is difficult to access. Code is available at https://github.com/mrazhou/BLS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | AP22.2 | 2643 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1201 | |
| Image Classification | CIFAR100 | Accuracy78.5 | 102 | |
| Image Classification | CIFAR10 | Accuracy95.6 | 91 | |
| Image Captioning | NoCaps 1.0 (val) | Overall Score65.3 | 32 | |
| Image Classification | ImageNet-1K | Accuracy80 | 18 | |
| Image Classification | CIFAR100 | Accuracy80.7 | 6 | |
| Image Captioning | COCO | BLEU@427.2 | 3 | |
| Multi-view Stereo | WHU-MVS | Accuracy (<3 units)95.17 | 3 | |
| Image Classification | CIFAR100 (test) | Top-1 Accuracy58 | 3 |