Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Enhance the Visual Representation via Discrete Adversarial Training

About

Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We notice the merit of AT in NLP tasks could derive from the discrete and symbolic input space. For borrowing the advantage from NLP-style AT, we propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT. As a plug-and-play technique for enhancing the visual representation, DAT achieves significant improvement on multiple tasks including image classification, object detection and self-supervised learning. Especially, the model pre-trained with Masked Auto-Encoding (MAE) and fine-tuned by our DAT without extra data can get 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet, building the new state-of-the-art. The code will be available at https://github.com/alibaba/easyrobust.

Xiaofeng Mao, Yuefeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K
mIoU35.85
1024
Image ClassificationImageNet-1k (val)
Top-1 Accuracy83.1
844
Image ClassificationImageNet A
Top-1 Acc68.92
654
Image ClassificationImageNet V2
Top-1 Acc78.82
611
Image ClassificationImageNet-R
Top-1 Acc65.61
529
Image ClassificationImageNet-Sketch
Top-1 Accuracy50.03
407
ClassificationCars
Accuracy53.09
395
Image ClassificationPACS
Overall Average Accuracy72.6
241
Image ClassificationImageNet-1k (val)
Accuracy81.5
199
Object DetectionCOCO
mAP40.41
137
Showing 10 of 41 rows

Other info

Code

Follow for update