Robust and Generalizable Visual Representation Learning via Random Convolutions

About

While successful for various computer vision tasks, deep neural networks have shown to be vulnerable to texture style shifts and small perturbations to which humans are robust. In this work, we show that the robustness of neural networks can be greatly improved through the use of random convolutions as data augmentation. Random convolutions are approximately shape-preserving and may distort local textures. Intuitively, randomized convolutions create an infinite number of new domains with similar global shapes but random local textures. Therefore, we explore using outputs of multi-scale random convolutions as new images or mixing them with the original images during training. When applying a network trained with our approach to unseen domains, our method consistently improves the performance on domain generalization benchmarks and is scalable to ImageNet. In particular, in the challenging scenario of generalizing to the sketch domain in PACS and to ImageNet-Sketch, our method outperforms state-of-art methods by a large margin. More interestingly, our method can benefit downstream tasks by providing a more robust pretrained visual representation.

Zhenlin Xu, Deyi Liu, Junlin Yang, Colin Raffel, Marc Niethammer• 2020

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-Sketch	Top-1 Accuracy18.1	473
Image Classification	PACS (test)	Average Accuracy67.5	279
Domain Generalization	PACS (test)	Average Accuracy86.63	225
Image Classification	CIFAR-10-C	Accuracy71.23	179
Cardiac Segmentation	ACDC (test)	Avg Dice87.94	162
Semantic segmentation	Mapillary (val)	mIoU32.43	153
Image Classification	PACS	Accuracy62.76	130
Semantic segmentation	BDD-100K (val)	mIoU30.92	102
Semantic segmentation	SYNTHIA (val)	mIoU24.45	71
Optic Cup / Disc Segmentation	Fundus Domain 2	DC (Cup)76.05	53

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord