BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
About
Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit differentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods proposed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that depends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Data Poisoning Defense | CIFAR-10 (test) | Test Accuracy64.3 | 76 | |
| Continual Learning | PMNIST (test) | Accuracy80.7 | 17 | |
| Continual Learning | PMNIST | Accuracy80.7 | 8 | |
| Continual Learning | Split CIFAR | Accuracy68.16 | 8 | |
| Data Poisoning | MNIST (test) | Clean Accuracy98.02 | 8 | |
| Sample Unlearning | CIFAR-10 (test) | Accuracy (ResNet18)48.09 | 6 | |
| Continual Learning | Split CIFAR (test) | ACC68.16 | 5 |