Differentially Private Histograms in the Shuffle Model from Fake Users

About

There has been much recent work in the shuffle model of differential privacy, particularly for approximate $d$-bin histograms. While these protocols achieve low error, the number of messages sent by each user -- the message complexity -- has so far scaled with $d$ or the privacy parameters. The message complexity is an informative predictor of a shuffle protocol's resource consumption. We present a protocol whose message complexity is two when there are sufficiently many users. The protocol essentially pairs each row in the dataset with a fake row and performs a simple randomization on all rows. We show that the error introduced by the protocol is small, using rigorous analysis as well as experiments on real-world data. We also prove that corrupt users have a relatively low impact on our protocol's estimates.

Albert Cheu, Maxim Zhilyaev• 2021

Related benchmarks

Task	Dataset	Result
Frequency Estimation	AOL	Relative Error7.1	6
Frequency Estimation	SF_Sal	Relative Error5.81	6
Frequency Estimation	BR_Sal	Relative Error (%)5.8	6
Frequency Estimation (Qhist)	Synthetic Zipf distribution	Relative Error (w/o attacker)0.12	3
Frequency Estimation	Shuffle-DP Theoretical Analysis	Messages per User2	3
Frequency Estimation (Qhist)	Synthetic Unif distribution	Relative Error (w/o Attacker)5.84	3
Frequency Estimation (Qhist)	Synthetic Gauss distribution	Relative Error (No Attacker)5.5	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord