Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations
About
In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its Universal Adversarial Perturbations (UAPs). UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via contrastive learning that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence > 99.99 within only 20 fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 | Accuracy71.1 | 435 | |
| Model Fingerprinting | CIFAR-100 | AUC90 | 52 | |
| Ownership Verification | CIFAR-10 | AUC100 | 49 | |
| Ownership Verification | CIFAR-100 | AUC89.7 | 49 | |
| Model Fingerprinting | CIFAR-10 | AUC85 | 47 | |
| Model Fingerprinting | MNIST | AUC0.966 | 47 | |
| Model Fingerprinting | Tiny-ImageNet | AUC0.981 | 45 | |
| Model Fingerprinting | Fashion MNIST | AUC86.9 | 40 | |
| Image Classification | CIFAR100 | AUC90.3 | 30 | |
| Training Data Provenance Verification | CIFAR10 | Avg AUC79.63 | 27 |