k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
About
Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Watermark Detection | BookSum | TP @ FP=1%98.2 | 154 | |
| Watermark Detection | C4 | TPR @ FPR=1%0.565 | 95 | |
| Watermarking | Natural Questions (NQ) (test) | AUROC100 | 45 | |
| Sentence-Level Watermarking | C4 | AUROC99.3 | 40 | |
| Watermark Detection | C4 | Detection Accuracy (No Attack)100 | 24 | |
| Watermarking Detection | BookSum (test) | Detection Rate (No Attack)99.6 | 24 | |
| Large Language Model Watermarking | BookSum (test) | Average Rank4.94 | 20 | |
| Text Quality Assessment | C4 | Average Rank4.93 | 20 | |
| Watermark Detection | C4 | TPR @ 1% FPR (No Attack)92.5 | 20 | |
| Watermark Detection Robustness | C4 | TP@FP=1%0.00e+0 | 12 |