Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

About

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

Zhuoshang Wang, Yubing Ren, Yanan Cao, Fang Fang, Xiaoxue Li, Li Guo• 2026

Related benchmarks

Task	Dataset	Result
Watermark Detection	C4 OPT-6.7B	ROC-AUC100	26
Watermark Detection	C4 Llama-3.1-8B	TPR100	8
Watermark Detection	OpenGen Llama-3.1-8B	TPR100	8
Watermark Detection	OpenGen OPT-6.7B	TPR100	8

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord