Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique

About

Growing concerns over the theft and misuse of Large Language Models (LLMs) have heightened the need for effective fingerprinting, which links a model to its original version to detect misuse. In this paper, we define five key properties for a successful fingerprint: Transparency, Efficiency, Persistence, Robustness, and Unforgeability. We introduce a novel fingerprinting framework that provides verifiable proof of ownership while maintaining fingerprint integrity. Our approach makes two main contributions. First, we propose a Chain and Hash technique that cryptographically binds fingerprint prompts with their responses, ensuring no adversary can generate colliding fingerprints and allowing model owners to irrefutably demonstrate their creation. Second, we address a realistic threat model in which instruction-tuned models' output distribution can be significantly altered through meta-prompts. By integrating random padding and varied meta-prompt configurations during training, our method preserves fingerprint robustness even when the model's output style is significantly modified. Experimental results demonstrate that our framework offers strong security for proving ownership and remains resilient against benign transformations like fine-tuning, as well as adversarial attempts to erase fingerprints. Finally, we also demonstrate its applicability to fingerprinting LoRA adapters.

Mark Russinovich, Ahmed Salem• 2024

Related benchmarks

TaskDatasetResultRank
Attack Success RateCTCC fingerprinting scenario b
SVA100
18
General Capability HarmlessnessGeneral LLM Task Benchmark
Average Accuracy59.8
12
Input Perturbation RobustnessInput Perturbation Remove 5%
FSR (5% Removal)100
10
Fingerprinting EffectivenessFingerprinted Model Clean
FSR100
10
Input Perturbation RobustnessInput Perturbation Remove 10%
FSR92
10
Fine-tuning RobustnessShareGPT
FSR1.00e+3
10
Fine-tuning RobustnessAlpaca Dataset
FSR0.00e+0
10
Fine-tuning RobustnessDolly Dataset
FSR0.00e+0
10
Input Stealthiness AssessmentNarrative-based corpus
PPL86.31
8
Input Stealthiness EvaluationFingerprint Input Triggers (test)
Perplexity (PPL)86.31
6
Showing 10 of 14 rows

Other info

Follow for update