Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness

About

Semantic-level watermarking (SWM) improves robustness against text modifications by treating sentences as the basic unit. However, robustness to paragraph-level paraphrasing remains difficult because such attacks globally disrupt watermark signals by changing sentence order. In this work, we propose SAMark, a self-anchored watermarking framework that removes the dependency on sentence order by establishing a step-independent green region in semantic space. To improve detectability, we introduce a multi-channel hyperbolic scoring mechanism that amplifies watermark signals while suppressing noise from weakly aligned candidates. We further propose a diversity-aware filtering strategy that combines hard filtering with soft regularization, extending beyond simple n-gram repetition filters to address semantic redundancy. Experimental results show that SAMark achieves up to 90.2% TP@FP1% under typical paragraph-level paraphrasing attacks, outperforming the strongest prior baseline by more than 30% on average, while maintaining generation quality competitive with unwatermarked text and breaking the robustness-quality trade-off that limits prior methods.

Jiahao Huo, Wenjie Qu, Yibo Yan, Kening Zheng, Jiaheng Zhang, Xuming Hu, Philip S. Yu, Mingxun Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Large Language Model WatermarkingBookSum (test)
Average Rank5.14
20
Watermark DetectionC4
TPR @ 1% FPR (No Attack)95.2
20
Text Quality AssessmentC4
Average Rank5.98
20
Watermark DetectionBOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)
Latency per Token0.0014
9
Watermarked Text GenerationBOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)
Latency per Token (s/tok)0.218
9
Showing 5 of 5 rows

Other info

Follow for update