Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

About

Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.

Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He• 2024

Related benchmarks

TaskDatasetResultRank
Watermark DetectionBookSum
TP @ FP=1%98.2
154
Watermark DetectionC4
TPR @ FPR=1%0.565
95
WatermarkingNatural Questions (NQ) (test)
AUROC100
45
Sentence-Level WatermarkingC4
AUROC99.3
40
Watermark DetectionC4
Detection Accuracy (No Attack)100
24
Watermarking DetectionBookSum (test)
Detection Rate (No Attack)99.6
24
Large Language Model WatermarkingBookSum (test)
Average Rank4.94
20
Text Quality AssessmentC4
Average Rank4.93
20
Watermark DetectionC4
TPR @ 1% FPR (No Attack)92.5
20
Watermark Detection RobustnessC4
TP@FP=1%0.00e+0
12
Showing 10 of 19 rows

Other info

Follow for update