A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
About
Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a \textbf{Di}stribution-\textbf{P}reserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking (distribution-preserving), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving reweight function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach's distribution-preserving property, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Watermark Detection | C4 subset | -- | 24 | |
| Spoofing Attack Robustness | BookSum | AUC0.5375 | 20 | |
| Spoofing Attack Robustness | C4 RealNewsLike | AUC0.5569 | 20 | |
| Spoofing attack traceability | RTP-LX (test) | AUC51.85 | 20 | |
| Spoofing attack traceability | RealToxicityPrompts (test) | AUC49.48 | 20 | |
| Paraphrase Attack Robustness | C4 RealNewsLike | AUC0.5337 | 20 | |
| Paraphrase Attack Robustness | BookSum | AUC55.12 | 20 | |
| Text Generation | C4 | TPR @ FPR=1%89.01 | 15 | |
| Latency Analysis | LVLM generation | Average Generation Time (s)8.3464 | 14 | |
| Multimodal Watermarking | MS-COCO 17 | PPL3.13 | 14 |