Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Watermark for Black-Box Language Models

About

Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require white-box access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. black-box access), boasts a distortion-free property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.

Dara Bahri, John Wieting• 2024

Related benchmarks

TaskDatasetResultRank
Large Language Model WatermarkingMistral-7B-Instruct (test)
Perplexity (PPL)2.61
34
Watermarkingeli5-category (test)
PPL1.61
28
Showing 2 of 2 rows

Other info

Follow for update