LAMP: Extracting Text from Gradients with Language Model Priors

About

Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our attack is based on two key insights: (i) modeling prior text probability with an auxiliary language model, guiding the search towards more natural text, and (ii) alternating continuous and discrete optimization, which minimizes reconstruction loss on embeddings, while avoiding local minima by applying discrete text transformations. Our experiments demonstrate that LAMP is significantly more effective than prior work: it reconstructs 5x more bigrams and 23% longer subsequences on average. Moreover, we are the first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.

Mislav Balunovi\'c, Dimitar I. Dimitrov, Nikola Jovanovi\'c, Martin Vechev• 2022

Related benchmarks

Task	Dataset	Result
Text reconstruction from gradients	Rotten Tomatoes	ROUGE-177.6	68
Sequence Reconstruction	COLA	ROUGE-183.9	32
Sequence Reconstruction	MIMIC-III	ROUGE-130.8	32
Training Data Reconstruction	SST	ROUGE-10.888	32
Training Data Reconstruction	RT	ROUGE-10.647	32
Training Data Reconstruction	COLA	ROUGE-189.6	32
Text reconstruction from gradients	COLA	ROUGE-194.5	24
Text reconstruction from gradients	SST-2	ROUGE-191.6	24
Text Reconstruction	IMDB	ROUGE-L1.2	12
Text Reconstruction	COLA	ROUGE-L10.6	12

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord