AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
About
Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter configuration, even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms, can be time-consuming, requiring multiple training runs over the entire dataset for different possible sets of hyper-parameters. Our central insight is that using an informative subset of the dataset for model training runs involved in hyper-parameter optimization, allows us to find the optimal hyper-parameter configuration significantly faster. In this work, we propose AUTOMATA, a gradient-based subset selection framework for hyper-parameter tuning. We empirically evaluate the effectiveness of AUTOMATA in hyper-parameter tuning through several experiments on real-world datasets in the text, vision, and tabular domains. Our experiments show that using gradient-based data subsets for hyper-parameter tuning achieves significantly faster turnaround times and speedups of 3$\times$-30$\times$ while achieving comparable performance to the hyper-parameters found using the entire dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | FMNIST | Speedup5.24 | 21 | |
| Image Classification | CIFAR10 | Speedup2.2 | 18 | |
| Neural Architecture Search | NAS-Bench-101 CIFAR-10 (test) | -- | 18 | |
| Image Classification | Tiny-ImageNet | Speedup2.08 | 14 | |
| Image Classification | CIFAR100 | Speedup0.84 | 11 | |
| Subset Selection | fMNIST (train) | Speedup5.24 | 10 | |
| Image Classification | Caltech-256 | Speedup2.07 | 9 | |
| Hyper-parameter optimization | CIFAR10 (test) | Test Error3.39 | 8 |