SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

About

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark signals by changing sentence order. In this work, we propose SAMark, a self-anchored watermarking framework that removes the dependency on sentence order by establishing a step-independent green region in semantic space. To improve detectability, we introduce a multi-channel hyperbolic scoring mechanism that amplifies watermark signals while suppressing noise from weakly aligned candidates. We further propose a diversity-aware filtering strategy that combines hard filtering with soft regularization, extending beyond simple n-gram repetition filters to address semantic redundancy. Experimental results show that SAMark achieves up to 90.2% TP@FP1% under typical paragraph-level paraphrasing attacks, outperforming the strongest prior baseline by more than 30% on average, while maintaining generation quality competitive with unwatermarked text and breaking the robustness-quality trade-off that limits prior methods. Our code will be released at [this URL](https://github.com/Z1zs/SAMark).

Jiahao Huo, Wenjie Qu, Yibo Yan, Kening Zheng, Jiaheng Zhang, Xuming Hu, Philip S. Yu, Mingxun Zhou• 2026

Related benchmarks

Task	Dataset	Result
Large Language Model Watermarking	BookSum (test)	Average Rank5.14	20
Watermark Detection	C4	TPR @ 1% FPR (No Attack)95.2	20
Text Quality Assessment	C4	Average Rank5.98	20
Watermark Detection	BOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)	Latency per Token0.0014	9
Watermarked Text Generation	BOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)	Latency per Token (s/tok)0.218	9

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord