Label-Only Model Inversion Attacks via Boundary Repulsion
About
Recent studies show that the state-of-the-art deep neural networks are vulnerable to model inversion attacks, in which access to a model is abused to reconstruct private training data of any given target class. Existing attacks rely on having access to either the complete target model (whitebox) or the model's soft-labels (blackbox). However, no prior work has been done in the harder but more practical scenario, in which the attacker only has access to the model's predicted label, without a confidence measure. In this paper, we introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert private training data using only the target model's predicted labels. The key idea of our algorithm is to evaluate the model's predicted labels over a sphere and then estimate the direction to reach the target class's centroid. Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data for various datasets and target model architectures. We compare BREP-MI with the state-of-the-art whitebox and blackbox model inversion attacks and the results show that despite assuming less knowledge about the target model, BREP-MI outperforms the blackbox attack and achieves comparable results to the whitebox attack.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Model Inversion | CelebA (test) | Attack Accuracy75.67 | 36 | |
| Model Inversion Attack | CelebA | Attack Acc73.93 | 16 | |
| Model Inversion Attack | Standard MI attack datasets (CelebA, Pubfig83, FFHQ, Facescrub) | Queries (M)17.98 | 15 | |
| Model Inversion Attack | Pubfig83 | Attack Accuracy72.8 | 8 | |
| Model Inversion Attack | Facescrub | Attack Accuracy40.2 | 8 | |
| Model Inversion Attack | Pubfig83 (private identities) | Attack Accuracy0.66 | 4 | |
| Model Inversion Attack | CelebA private identities | Attack Accuracy75.67 | 4 | |
| Model Inversion Attack | Facescrub (private identities) | Attack Accuracy0.3568 | 4 | |
| Model Inversion Attack | CelebA 128x128 resolution (test) | KNN Distance1.39e+3 | 4 |