LLM Fingerprinting via Semantically Conditioned Watermarks
About
Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce LLM fingerprinting via semantically conditioned watermarks, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fingerprint Robustness Evaluation | Prominent Deployment Scenarios Robustness Evaluation 1.0 | Fingerprint Success Rate100 | 24 | |
| Fingerprint Detection | WildChat Fr | FSR1 | 18 | |
| Fingerprint Detection | Active Output Modification | FSR100 | 18 | |
| Fingerprint Robustness Evaluation | System Prompts Pirate | FSR100 | 9 | |
| Fingerprint Robustness Evaluation | System Prompts Weather | FSR100 | 9 | |
| Fingerprint Robustness Evaluation | Active Input Translation | FSR1 | 9 | |
| Fingerprint Robustness Evaluation | Active Output Translation | FSR1 | 9 | |
| Fingerprint Robustness Evaluation | System Prompts Robot | FSR1 | 9 | |
| Fingerprint Robustness Evaluation | System Prompts OAI | FSR100 | 9 | |
| Fingerprint Detection | English System Prompts | FSR100 | 9 |