SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation
About
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence-level rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. A margin-based constraint is used to enhance its robustness. To show the advantages of our algorithm, we propose a "bigram" paraphrase attack using the paraphrase that has the fewest bigram overlaps with the original sentence. This attack is shown to be effective against the existing token-level watermarking method. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Watermark Detection | BookSum | TP @ FP=1%98.4 | 154 | |
| Watermark Detection | C4 | TPR @ FPR=1%0.925 | 95 | |
| Watermarking | Natural Questions (NQ) (test) | AUROC99.7 | 45 | |
| Sentence-Level Watermarking | C4 | AUROC99.6 | 40 | |
| Watermark Detection | C4 | Detection Accuracy (No Attack)94.6 | 24 | |
| Watermarking Detection | BookSum (test) | Detection Rate (No Attack)97.7 | 24 | |
| Large Language Model Watermarking | BookSum (test) | Average Rank5.04 | 20 | |
| Text Quality Assessment | C4 | Average Rank4.73 | 20 | |
| Watermark Detection | C4 | TPR @ 1% FPR (No Attack)96.4 | 20 | |
| Watermark Detection | BOOKSUM Mistral-Small-3.1-24B-Base-2503 (test) | Latency per Token0.0063 | 9 |