Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

About

Text watermarking aims to subtly embed statistical signals into text by controlling the Large Language Model (LLM)'s sampling process, enabling watermark detectors to verify that the output was generated by the specified model. The robustness of these watermarking algorithms has become a key factor in evaluating their effectiveness. Current text watermarking algorithms embed watermarks in high-entropy tokens to ensure text quality. In this paper, we reveal that this seemingly benign design can be exploited by attackers, posing a significant risk to the robustness of the watermark. We introduce a generic efficient paraphrasing attack, the Self-Information Rewrite Attack (SIRA), which leverages the vulnerability by calculating the self-information of each token to identify potential pattern tokens and perform targeted attack. Our work exposes a widely prevalent vulnerability in current watermarking algorithms. The experimental results show SIRA achieves nearly 100% attack success rates on seven recent watermarking methods with only 0.88 USD per million tokens cost. Our approach does not require any access to the watermark algorithms or the watermarked LLM and can seamlessly transfer to any LLM as the attack model, even mobile-level models. Our findings highlight the urgent need for more robust watermarking.

Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal• 2025

Related benchmarks

Task	Dataset	Result
Watermark Removal	Watermarked Text 500 tokens	EWD0.25	30
Watermark Removal	Watermarked Text 1500 tokens	EWD0.00e+0	30
Watermark Evasion	LLM Watermarking Algorithms KGW, Unigram, UPV, EWD, DIP, SIR, EXP	KGW Evasion Score98.8	11
Watermark Evasion	KGW	Attack Success Rate98.8	6
Watermark Evasion	Unigram	Attack Success Rate95	6
Watermark Evasion	EWD	Attack Success Rate99.8	6
Watermark Evasion	SIR	Attack Success Rate72.8	6
Watermark Evasion	UPV	Attack Success Rate (ASR)87.6	6
Watermark Evasion	DIP	Attack Success Rate99.6	6
Watermark Evasion	EXP	Attack Success Rate95.2	6

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord