Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations

About

In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its Universal Adversarial Perturbations (UAPs). UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via contrastive learning that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence > 99.99 within only 20 fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.

Zirui Peng, Shaofeng Li, Guoxing Chen, Cheng Zhang, Haojin Zhu, Minhui Xue• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100
Accuracy71.1
435
Model FingerprintingCIFAR-100
AUC90
52
Ownership VerificationCIFAR-10
AUC100
49
Ownership VerificationCIFAR-100
AUC89.7
49
Model FingerprintingCIFAR-10
AUC85
47
Model FingerprintingMNIST
AUC0.966
47
Model FingerprintingTiny-ImageNet
AUC0.981
45
Model FingerprintingFashion MNIST
AUC86.9
40
Image ClassificationCIFAR100
AUC90.3
30
Training Data Provenance VerificationCIFAR10
Avg AUC79.63
27
Showing 10 of 19 rows

Other info

Follow for update