Theoretically Principled Trade-off between Robustness and Accuracy
About
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy62.37 | 3518 | |
| Image Classification | CIFAR-10 (test) | Accuracy88.64 | 3381 | |
| Image Classification | MNIST | Accuracy99.4 | 395 | |
| Image Classification | TinyImageNet (test) | Accuracy38.51 | 366 | |
| Image Classification | CIFAR-10 (test) | Accuracy (Clean)85.9 | 273 | |
| Image Classification | Fashion MNIST | Accuracy78.82 | 225 | |
| Image Classification | CIFAR10 (train) | Accuracy98.98 | 90 | |
| Image Classification | GTSRB | Natural Accuracy72.3 | 87 | |
| Adversarial Robustness | CIFAR-10 (test) | -- | 76 | |
| Image Classification | Tiny-ImageNet 1.0 (test) | Accuracy (Natural)60.8 | 75 |