IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

About

In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. Our results show the possibility of reducing privacy leakage by more than half with limited impact on utility.

Ahmed Frikha, Nassim Walha, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou• 2024

Related benchmarks

Task	Dataset	Result
Question Answering	NQ (test)	--	143
Question Answering	PopQA (test)	Accuracy38.24	122
Private information retention	PopQA D_special (test)	r_pri28.65	20
Private information retention	NQ D_special (test)	r_pri15.22	20
Private information retention	HQA D_special (test)	r_pri17.87	20
Private information retention	TQA D_special (test)	r_pri16.26	20
Text Anonymization	PersonalReddit	Privacy Score12.3	14
Question Answering	HQA (test)	Accuracy22.18	11
De-Anonymization Resistance	PopQA (test)	r_connect0.3923	10
De-Anonymization Resistance	HQA (test)	r_connect0.4284	10

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord