Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Topic-Based Watermarks for Large Language Models

About

The indistinguishability of large language model (LLM) output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often involve trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves text quality comparable to industry-leading systems and simultaneously improves watermark robustness against paraphrasing and lexical perturbation attacks, with minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, enabling straightforward adoption and suggesting a practical path toward globally consistent watermarking of AI-generated content.

Alexander Nemecek, Yuzhou Jiang, Erman Ayday• 2024

Related benchmarks

TaskDatasetResultRank
Watermark DetectionC4
TPR @ 1% FPR100
36
Watermark DetectionC4 OPT-6.7B
ROC-AUC100
26
Watermark DetectionC4 Gemma-7B
ROC-AUC0.998
18
Watermark DetectionC4 200 tokens (1,000 human-written samples)
FPR20
10
Watermark Detection (Threshold Sensitivity Analysis)C4 1,000 human-written samples 200 tokens
FPR0.1
9
Text Quality EvaluationC4 20 prompts (test)
Fluency3.1
4
Watermark DetectionOPT-6.7B No Attack
ROC-AUC0.999
2
Watermark DetectionOPT-6.7B PEGASUS
ROC-AUC0.959
2
Watermark DetectionOPT DIPPER 6.7B
ROC AUC0.929
2
Text Quality AssessmentOPT-6.7B generated text samples (test)
Fluency3.23
2
Showing 10 of 11 rows

Other info

Follow for update