AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
About
Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Model Fingerprinting Robustness Evaluation | Pruning Robustness Evaluation Dataset | Similarity Score1 | 127 | |
| Model Fingerprinting Robustness | Structured Pruning Suspects Sheared-Llama | Similarity Score99.52 | 42 | |
| Fingerprint Similarity | LLaMA2-7B | Similarity Score1 | 24 | |
| Model Fingerprinting Robustness | Unstructured Pruning Suspects Llama-2-7b | Similarity Score99.96 | 21 | |
| Model Fingerprinting | Qwen2.5-derived suspects v0.1 | Similarity Score0.9968 | 12 | |
| Knowledge Distillation Robustness | Qwen2.5-14B teacher vs. DeepSeek-R1-Distill-Qwen-14B student (test) | Similarity Score98.75 | 7 | |
| Knowledge Distillation Robustness | Llama-2-7B teacher vs. llama-2-7b-logit-watermark-distill-kgw-k1-gamma0.25-delta2 student (test) | Similarity Score99.98 | 7 | |
| Model Fingerprinting | Llama-2 DPO 7B | Similarity Score99.94 | 7 | |
| Model Fingerprinting Robustness | FuseLLM 7b Distribution Merging Openllama-2-7b | Similarity Score79.53 | 7 | |
| Model Fingerprinting Robustness | Fusellm-7b Distribution Merging - Mpt-7b | Similarity Score78.51 | 7 |