Knockoff Nets: Stealing Functionality of Black-Box Models
About
Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such "victim" models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we present an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a "knockoff" with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as on a popular image analysis API where we create a reasonable knockoff for as little as $30.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | -- | 3518 | |
| Image Classification | MNIST (test) | Accuracy98.81 | 882 | |
| Image Classification | ImageNet-1K | Top-1 Acc57.43 | 836 | |
| Image Classification | CIFAR-100 | Top-1 Accuracy58.49 | 622 | |
| Image Classification | CIFAR-10 | -- | 507 | |
| Image Classification | MNIST | -- | 395 | |
| Image Classification | TinyImageNet (test) | -- | 366 | |
| Image Classification | Tiny-ImageNet | Top-1 Accuracy50.22 | 143 | |
| Image Classification | SVHN (test) | Top-1 Accuracy83.37 | 26 | |
| Model Stealing | CIFAR-10 (test) | -- | 10 |