Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique

About

Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting, which links a model to its original version to detect misuse. In this paper, we define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability. We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity. Our approach makes two main contributions. First, we propose a Chain and Hash technique that cryptographically binds fingerprint prompts with their responses, ensuring no adversary can generate colliding fingerprints and allowing model owners to irrefutably demonstrate their creation. Second, we address a realistic threat model in which instruction-tuned models' output distribution can be significantly altered through meta-prompts. By integrating random padding and varied meta-prompt configurations during training, our method preserves fingerprint robustness even when the model's output style is significantly modified. Experimental results demonstrate that our framework offers strong security for proving ownership and remains resilient against benign transformations like fine-tuning, as well as adversarial attempts to erase fingerprints. Finally, we also demonstrate its applicability to fingerprinting LoRA adapters.

Mark Russinovich, Ahmed Salem• 2024

Related benchmarks

Task	Dataset	Result
Attack Success Rate	CTCC fingerprinting scenario b	SVA100	18
General Capability Harmlessness	General LLM Task Benchmark	Average Accuracy59.8	12
Input Perturbation Robustness	Input Perturbation Remove 5%	FSR (5% Removal)100	10
Fingerprinting Effectiveness	Fingerprinted Model Clean	FSR100	10
Input Perturbation Robustness	Input Perturbation Remove 10%	FSR92	10
Fine-tuning Robustness	ShareGPT	FSR1.00e+3	10
Fine-tuning Robustness	Alpaca Dataset	FSR0.00e+0	10
Fine-tuning Robustness	Dolly Dataset	FSR0.00e+0	10
Input Stealthiness Assessment	Narrative-based corpus	PPL86.31	8
Input Stealthiness Evaluation	Fingerprint Input Triggers (test)	Perplexity (PPL)86.31	6

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord