Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CUDA: Convolution-based Unlearnable Datasets

About

Large-scale training of modern deep learning models heavily relies on publicly available data on the web. This potentially unauthorized usage of online data leads to concerns regarding data privacy. Recent works aim to make unlearnable data for deep learning models by adding small, specially designed noises to tackle this issue. However, these methods are vulnerable to adversarial training (AT) and/or are computationally heavy. In this work, we propose a novel, model-free, Convolution-based Unlearnable DAtaset (CUDA) generation technique. CUDA is generated using controlled class-wise convolutions with filters that are randomly generated via a private key. CUDA encourages the network to learn the relation between filters and labels rather than informative features for classifying the clean data. We develop some theoretical analysis demonstrating that CUDA can successfully poison Gaussian mixture data by reducing the clean data performance of the optimal Bayes classifier. We also empirically demonstrate the effectiveness of CUDA with various datasets (CIFAR-10, CIFAR-100, ImageNet-100, and Tiny-ImageNet), and architectures (ResNet-18, VGG-16, Wide ResNet-34-10, DenseNet-121, DeIT, EfficientNetV2-S, and MobileNetV2). Our experiments show that CUDA is robust to various data augmentations and training approaches such as smoothing, AT with different budgets, transfer learning, and fine-tuning. For instance, training a ResNet-18 on ImageNet-100 CUDA achieves only 8.96$\%$, 40.08$\%$, and 20.58$\%$ clean test accuracies with empirical risk minimization (ERM), $L_{\infty}$ AT, and $L_{2}$ AT, respectively. Here, ERM on the clean training data achieves a clean test accuracy of 80.66$\%$. CUDA exhibits unlearnability effect with ERM even when only a fraction of the training dataset is perturbed. Furthermore, we also show that CUDA is robust to adaptive defenses designed specifically to break it.

Vinu Sankar Sadasivan, Mahdi Soltanolkotabi, Soheil Feizi• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10 (test)
Accuracy89.49
3381
Semantic segmentationADE20K (val)
mIoU19.6
2731
Instance SegmentationCOCO 2017 (val)--
1144
Semantic segmentationCityscapes (val)
mIoU65.8
287
Panoptic SegmentationCityscapes (val)
PQ51.6
276
Instance SegmentationCityscapes (val)
AP29.9
239
Panoptic SegmentationCOCO 2017 (val)
PQ6.7
172
Panoptic SegmentationADE20K (val)
PQ10.7
89
Object DetectionCOCO (test)
mAP46.8
35
Panoptic SegmentationCOCO (test)
PQ6.7
23
Showing 10 of 17 rows

Other info

Code

Follow for update