Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A System for Massively Parallel Hyperparameter Tuning

About

Modern learning models are characterized by large hyperparameter spaces and long training times. These properties, coupled with the rise of parallel computing and the growing demand to productionize machine learning workloads, motivate the need to develop mature hyperparameter optimization functionality in distributed computing settings. We address this challenge by first introducing a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameter optimization problems. Our extensive empirical results show that ASHA outperforms existing state-of-the-art hyperparameter optimization methods; scales linearly with the number of workers in distributed settings; and is suitable for massive parallelism, as demonstrated on a task with 500 workers. We then describe several design decisions we encountered, along with our associated solutions, when integrating ASHA in Determined AI's end-to-end production-quality machine learning system that offers hyperparameter tuning as a service.

Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, Ameet Talwalkar• 2018

Related benchmarks

TaskDatasetResultRank
Hyperparameter OptimizationJAHS-C10 Bench (val)
Validation Error9.739
52
Hyperparameter OptimizationJAHS-Bench CH (val)
Validation Error5.492
31
Hyperparameter OptimizationJAHS-Bench FM (val)
Validation Error5.126
28
Hyperparameter OptimizationImageNet PD1 (val)
Validation Error25.4
24
Hyperparameter OptimizationCifar100 PD1 (val)
Validation Error27.1
24
Hyperparameter OptimizationPD1-LM1B (val)
Validation Error0.649
24
Hyperparameter OptimizationPD1 WMT (val)
Validation Error39.6
24
Showing 7 of 7 rows

Other info

Follow for update