SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

About

Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence-level rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. A margin-based constraint is used to enhance its robustness. To show the advantages of our algorithm, we propose a "bigram" paraphrase attack using the paraphrase that has the fewest bigram overlaps with the original sentence. This attack is shown to be effective against the existing token-level watermarking method. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.

Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov• 2023

Related benchmarks

Task	Dataset	Result
Watermark Detection	BookSum	TP @ FP=1%98.4	154
Watermark Detection	C4	TPR @ FPR=1%0.925	95
Watermarking	Natural Questions (NQ) (test)	AUROC99.7	45
Sentence-Level Watermarking	C4	AUROC99.6	40
Watermark Detection	C4	Detection Accuracy (No Attack)94.6	24
Watermarking Detection	BookSum (test)	Detection Rate (No Attack)97.7	24
Large Language Model Watermarking	BookSum (test)	Average Rank5.04	20
Text Quality Assessment	C4	Average Rank4.73	20
Watermark Detection	C4	TPR @ 1% FPR (No Attack)96.4	20
Watermark Detection	BOOKSUM Mistral-Small-3.1-24B-Base-2503 (test)	Latency per Token0.0063	9

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord