Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

About

The task of text privatization using Differential Privacy has recently taken the form of $\textit{text rewriting}$, in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting process. In response to this, we propose $\textbf{DP-MLM}$, a new method for differentially private text rewriting based on leveraging masked language models (MLMs) to rewrite text in a semantically similar $\textit{and}$ obfuscated manner. We accomplish this with a simple contextualization technique, whereby we rewrite a text one token at a time. We find that utilizing encoder-only MLMs provides better utility preservation at lower $\varepsilon$ levels, as compared to previous methods relying on larger models with a decoder. In addition, MLMs allow for greater customization of the rewriting mechanism, as opposed to generative approaches. We make the code for $\textbf{DP-MLM}$ public and reusable, found at https://github.com/sjmeis/DPMLM .

Stephen Meisenbacher, Maulik Chevli, Juraj Vladika, Florian Matthes• 2024

Related benchmarks

TaskDatasetResultRank
Agentic re-identification27 transcripts GPT-5.1 attacker
Re-ID Success Count4
18
Agentic re-identification27 transcripts Gemini-3-Flash attacker
Re-ID Success Count4
18
Agentic re-identification27 transcripts GPT-5.4-mini attacker
Re-ID Success Rate (Derived)18.5185
18
Medical Diagnosis ClassificationPri-SLJA (test)
Accuracy81.14
7
Privacy RewritingPri-SLJA
Accuracy78.78
7
Medical Diagnosis ClassificationPri-DDXPlus (test)
Accuracy52.19
7
Medical Diagnosis ClassificationPri-Mixture (test)
Accuracy61.27
7
Privacy RewritingDDXPlus Pri
Accuracy50.94
7
Privacy RewritingPri-Mixture
Accuracy59.6
7
Text RewritingECHR court cases
Linkability Proportion (arity=1)0.1
6
Showing 10 of 11 rows

Other info

Follow for update