Composition for Pufferfish Privacy
About
When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $(a,b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Top-k popular visited locations selection | Foursquare check-in Top-3 popular visited locations | HR97.8 | 48 | |
| Activity Prediction | Capture24 Top-3 most frequent activities | Top-1 Acc0.939 | 24 | |
| Human Activity Recognition | Capture24 2024 | HR0.912 | 24 | |
| Top-5 most frequently visited location querying | Foursquare check-in raw (test) | Acc@196.14 | 24 | |
| Top-k popular visited location prediction | Foursquare check-in dataset Top-3 popular visited locations 77 most common labels | Acc@197.3 | 24 |