ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
About
Recently, learning algorithms motivated from sharpness of loss surface as an effective measure of generalization gap have shown state-of-the-art performances. Nevertheless, sharpness defined in a rigid region with a fixed radius, has a drawback in sensitivity to parameter re-scaling which leaves the loss unaffected, leading to weakening of the connection between sharpness and generalization gap. In this paper, we introduce the concept of adaptive sharpness which is scale-invariant and propose the corresponding generalization bound. We suggest a novel learning method, adaptive sharpness-aware minimization (ASAM), utilizing the proposed generalization bound. Experimental results in various benchmark datasets show that ASAM contributes to significant improvement of model generalization performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy89.9 | 3518 | |
| Image Classification | CIFAR-10 (test) | Accuracy98.68 | 3381 | |
| Image Classification | CIFAR10 (test) | Accuracy97.56 | 585 | |
| Image Classification | ImageNet (test) | -- | 235 | |
| Machine Translation | IWSLT De-En 2014 (test) | BLEU35.02 | 146 | |
| Image Classification | CIFAR100 (test) | Accuracy56.19 | 112 | |
| Image Classification | CIFAR10 centralized performance (test) | Accuracy86.07 | 104 | |
| Image Classification | ImageNet Robustness Suite | Top-1 Accuracy (ImageNet-A)92.99 | 42 | |
| Image Classification | ImageNet Clean 1K (val) | -- | 24 | |
| Semantic segmentation | Cityscapes and IDDA (test) | mIoU (Country Seen)46.57 | 15 |