Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM Fingerprinting via Semantically Conditioned Watermarks

About

Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce LLM fingerprinting via semantically conditioned watermarks, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios.

Thibaud Gloaguen, Robin Staab, Nikola Jovanovi\'c, Martin Vechev• 2025

Related benchmarks

TaskDatasetResultRank
Fingerprint Robustness EvaluationProminent Deployment Scenarios Robustness Evaluation 1.0
Fingerprint Success Rate100
24
Fingerprint DetectionWildChat Fr
FSR1
18
Fingerprint DetectionActive Output Modification
FSR100
18
Fingerprint Robustness EvaluationSystem Prompts Pirate
FSR100
9
Fingerprint Robustness EvaluationSystem Prompts Weather
FSR100
9
Fingerprint Robustness EvaluationActive Input Translation
FSR1
9
Fingerprint Robustness EvaluationActive Output Translation
FSR1
9
Fingerprint Robustness EvaluationSystem Prompts Robot
FSR1
9
Fingerprint Robustness EvaluationSystem Prompts OAI
FSR100
9
Fingerprint DetectionEnglish System Prompts
FSR100
9
Showing 10 of 10 rows

Other info

Follow for update