Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

About

Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence-level rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. A margin-based constraint is used to enhance its robustness. To show the advantages of our algorithm, we propose a "bigram" paraphrase attack using the paraphrase that has the fewest bigram overlaps with the original sentence. This attack is shown to be effective against the existing token-level watermarking method. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.

Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov• 2023

Related benchmarks

TaskDatasetResultRank
Watermark DetectionC4
Detection Accuracy (No Attack)94.6
24
Watermarking DetectionBookSum (test)
Detection Rate (No Attack)97.7
24
Machine TranslationWMT De-En 19
COMET87.4
6
SummarizationXsum
ROUGE-L20
6
Code GenerationMBPP
Pass@134.2
5
Watermark DetectionC4
TPR @ FPR=1%0.925
5
Watermarking Token EfficiencyBookSum (test)
Avg Tokens per Sentence1.69e+3
5
Paragraph TranslationWMT 23 (test)
BLEU42.8
4
Open-ended generationC4 RealNews
Perplexity3.6
4
Showing 9 of 9 rows

Other info

Follow for update