Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations

About

In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. Our key insight is that the profile of a DNN model's decision boundary can be uniquely characterized by its Universal Adversarial Perturbations (UAPs). UAPs belong to a low-dimensional subspace and piracy models' subspaces are more consistent with victim model's subspace compared with non-piracy model. Based on this, we propose a UAP fingerprinting method for DNN models and train an encoder via contrastive learning that takes fingerprint as inputs, outputs a similarity score. Extensive studies show that our framework can detect model IP breaches with confidence > 99.99 within only 20 fingerprints of the suspect model. It has good generalizability across different model architectures and is robust against post-modifications on stolen models.

Zirui Peng, Shaofeng Li, Guoxing Chen, Cheng Zhang, Haojin Zhu, Minhui Xue• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100	Accuracy71.1	435
Model Fingerprinting	CIFAR-100	AUC90	52
Ownership Verification	CIFAR-10	AUC100	49
Ownership Verification	CIFAR-100	AUC89.7	49
Model Fingerprinting	CIFAR-10	AUC85	47
Model Fingerprinting	MNIST	AUC0.966	47
Model Fingerprinting	Tiny-ImageNet	AUC0.981	45
Model Fingerprinting	Fashion MNIST	AUC86.9	40
Image Classification	CIFAR100	AUC90.3	30
Training Data Provenance Verification	CIFAR10	Avg AUC79.63	27

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord