Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Customized Text Sanitization Mechanism with Differential Privacy

About

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $\epsilon$-differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms. The code is available at https://github.com/sai4july/CusText.

Huimin Chen, Fengran Mo, Yanhao Wang, Cen Chen, Jian-Yun Nie, Chengyu Wang, Jamie Cui• 2022

Related benchmarks

TaskDatasetResultRank
Sentiment ClassificationSST2 (test)
Accuracy81.55
233
Text ClassificationSST-2
Accuracy76.83
125
Natural Language InferenceQNLI
Accuracy77.4
61
Semantic Textual SimilarityMedSTS
Pearson Correlation0.6316
17
Query AttackSST-2 (test)
Query Count (she)4
11
Privacy-Preserving Text GenerationCNN Daily Mail
Cosine Similarity0.585
9
Privacy-Preserving Text GenerationWikitext-103 v1
Cosine Similarity0.627
9
Privacy-Preserving Text GenerationArXiv Dataset
Cosine Similarity69.4
9
Privacy ProtectionFinRED
BERT Inference Attack Rate86.5
9
Privacy Protection against Inference AttacksMedQA
BERT Inference Attack Success Rate83.1
9
Showing 10 of 10 rows

Other info

Code

Follow for update