Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models

About

Watermarking for large language models (LLMs) offers a promising approach to identifying AI-generated text. Existing approaches, however, either compromise the distribution of original generated text by LLMs or are limited to embedding zero-bit information that only allows for watermark detection but ignores identification. We present StealthInk, a stealthy multi-bit watermarking scheme that preserves the original text distribution while enabling the embedding of provenance data, such as userID, TimeStamp, and modelID, within LLM-generated text. This enhances fast traceability without requiring access to the language model's API or prompts. We derive a lower bound on the number of tokens necessary for watermark detection at a fixed equal error rate, which provides insights on how to enhance the capacity. Comprehensive empirical evaluations across diverse tasks highlight the stealthiness, detectability, and resilience of StealthInk, establishing it as an effective solution for LLM watermarking applications.

Ya Jiang, Chuxiong Wu, Massieh Kordi Boroujeny, Brian Mark, Kai Zeng• 2025

Related benchmarks

TaskDatasetResultRank
Text CompletionText Completion
Binary Accuracy92.5
27
Text SummarizationText Summarization
BA71.5
24
Text CompletionOpenGen
Bit Accuracy71.69
20
Text CompletionEssays
Binary Accuracy72.12
20
Text CompletionOpenGen (test)
Bit Accuracy (BA)80.12
16
Text CompletionEssays (test)
BA79.25
16
Multi-bit WatermarkingLLM text 200 tokens
Perplexity7.2339
14
Multi-bit WatermarkingLLaMA2-7B 300 tokens (test)
Perplexity7.8241
14
Watermark Detectability400-token texts paraphrasing attack (test)
AUC51.88
13
Text Quality EvaluationLLM-generated text 300 tokens 36 bits
Distinct-294.98
12
Showing 10 of 28 rows

Other info

Follow for update