LLM Fingerprinting via Semantically Conditioned Watermarks

About

Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce LLM fingerprinting via semantically conditioned watermarks, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios.

Thibaud Gloaguen, Robin Staab, Nikola Jovanovi\'c, Martin Vechev• 2025

Related benchmarks

Task	Dataset	Result
Fingerprint Robustness Evaluation	Prominent Deployment Scenarios Robustness Evaluation 1.0	Fingerprint Success Rate100	24
Fingerprint Detection	WildChat Fr	FSR1	18
Fingerprint Detection	Active Output Modification	FSR100	18
Fingerprint Robustness Evaluation	System Prompts Pirate	FSR100	9
Fingerprint Robustness Evaluation	System Prompts Weather	FSR100	9
Fingerprint Robustness Evaluation	Active Input Translation	FSR1	9
Fingerprint Robustness Evaluation	Active Output Translation	FSR1	9
Fingerprint Robustness Evaluation	System Prompts Robot	FSR1	9
Fingerprint Robustness Evaluation	System Prompts OAI	FSR100	9
Fingerprint Detection	English System Prompts	FSR100	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord