PAC-Private Responses with Adversarial Composition

About

Modern machine learning models are increasingly deployed behind APIs. This renders standard weight-privatization methods (e.g. DP-SGD) unnecessarily noisy at the cost of utility. While model weights may vary significantly across training datasets, model responses to specific inputs are much lower dimensional and more stable. This motivates enforcing privacy guarantees directly on model outputs. We approach this under PAC privacy, which provides instance-based privacy guarantees for arbitrary black-box functions by controlling mutual information (MI). Importantly, PAC privacy explicitly rewards output stability with reduced noise levels. However, a central challenge remains: response privacy requires composing a large number of adaptively chosen, potentially adversarial queries issued by untrusted users, where existing composition results on PAC privacy are inadequate. We introduce a new algorithm that achieves adversarial composition via adaptive noise calibration and prove that mutual information guarantees accumulate linearly under adaptive and adversarial querying. Experiments across tabular, vision, and NLP tasks show that our method achieves high utility at extremely small per-query privacy budgets. On CIFAR-10, we achieve 87.79% accuracy with a per-step MI budget of $2^{-32}$. This enables serving one million queries while provably bounding membership inference attack (MIA) success rates to 51.08% -- the same guarantee of $(0.04, 10^{-5})$-DP. Furthermore, we show that private responses can be used to label public data to distill a publishable privacy-preserving model; using an ImageNet subset as a public dataset, our model distilled from 210,000 responses achieves 91.86% accuracy on CIFAR-10 with MIA success upper-bounded by 50.49%, which is comparable to $(0.02,10^{-5})$-DP.

Xiaochen Zhu, Mayuri Sridhar, Srinivas Devadas• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	Accuracy77.79	3518
Image Classification	CIFAR-10 (test)	Accuracy95.8	906
Text Classification	AG News (test)	Accuracy90.44	293
Binary Classification	Income (test)	Test Accuracy87.17	34
Text Classification	IMDB reviews (test)	Accuracy85.13	14
Tabular Classification	Bank Marketing (test)	Accuracy0.9169	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord