Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework
About
While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Watermark Detection | C4 OPT-6.7B | ROC-AUC100 | 26 | |
| Watermark Detection | C4 Llama-3.1-8B | TPR100 | 8 | |
| Watermark Detection | OpenGen Llama-3.1-8B | TPR100 | 8 | |
| Watermark Detection | OpenGen OPT-6.7B | TPR100 | 8 |