Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning
About
Client-side privacy rewriting is crucial for deploying LLMs in privacy-sensitive domains. However, existing approaches struggle to balance privacy and utility. Full-text methods often distort context, while span-level approaches rely on impractical manual masks or brittle static dictionaries. Attempts to automate localization via prompt-based LLMs prove unreliable, as they suffer from unstable instruction following that leads to privacy leakage and excessive context scrubbing. To address these limitations, we propose DAMPER (Domain-Aware Mask-free Privacy Extraction and Rewriting). DAMPER operationalizes latent privacy semantics into compact Domain Privacy Prototypes via contrastive learning, enabling precise, autonomous span localization. Furthermore, we introduce a Prototype-Guided Preference Alignment, which leverages learned prototypes as semantic anchors to construct preference pairs, optimizing a domain-compliant rewriting policy without human annotations. At inference time, DAMPER integrates a sampling-based Exponential Mechanism to provide rigorous span-level Differential Privacy (DP) guarantees. Extensive experiments demonstrate that DAMPER significantly outperforms existing baselines, achieving a superior privacy-utility trade-off.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Diagnosis Classification | Pri-DDXPlus (test) | Accuracy79.71 | 7 | |
| Medical Diagnosis Classification | Pri-SLJA (test) | Accuracy83.17 | 7 | |
| Medical Diagnosis Classification | Pri-Mixture (test) | Accuracy80.01 | 7 | |
| Privacy Rewriting | DDXPlus Pri | Accuracy78.13 | 7 | |
| Privacy Rewriting | Pri-SLJA | Accuracy82.68 | 7 | |
| Privacy Rewriting | Pri-Mixture | Accuracy78.29 | 7 |