Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Provable Robust Watermarking for AI-Generated Text

About

We study the problem of watermarking large language models (LLMs) generated text -- one of the most promising approaches for addressing the safety challenges of LLM usage. In this paper, we propose a rigorous theoretical framework to quantify the effectiveness and robustness of LLM watermarks. We propose a robust and high-quality watermark method, Unigram-Watermark, by extending an existing approach with a simplified fixed grouping strategy. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing. Experiments on three varying LLMs and two datasets verify that our Unigram-Watermark achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github.com/XuandongZhao/Unigram-Watermark.

Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingC4
Perplexity10.21
1182
Mathematical ReasoningGSM8K (test)
Accuracy12.21
751
Question AnsweringTruthfulQA
Truthful*Inf Score71.96
42
NLP WatermarkingWaterBench & RepoBench-P (test)
KoLA Score1.8
24
Text GenerationC4
TPR @ FPR=1%99.88
15
Machine-generated text detectionOpenGen No Editing
TPR1
3
Machine-generated text detectionLFQA No Editing
TPR100
3
Machine-generated text detectionOpenGen 10% Editing
TPR99.2
3
Machine-generated text detectionLFQA (10% Editing)
TPR99.7
3
Showing 9 of 9 rows

Other info

Follow for update