Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

About

Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.

Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng, Dezhang Kong, Meng Han• 2026

Related benchmarks

TaskDatasetResultRank
Model Fingerprinting Robustness EvaluationPruning Robustness Evaluation Dataset
Similarity Score1
127
Model Fingerprinting RobustnessStructured Pruning Suspects Sheared-Llama
Similarity Score99.52
42
Fingerprint SimilarityLLaMA2-7B
Similarity Score1
24
Model Fingerprinting RobustnessUnstructured Pruning Suspects Llama-2-7b
Similarity Score99.96
21
Model FingerprintingQwen2.5-derived suspects v0.1
Similarity Score0.9968
12
Knowledge Distillation RobustnessQwen2.5-14B teacher vs. DeepSeek-R1-Distill-Qwen-14B student (test)
Similarity Score98.75
7
Knowledge Distillation RobustnessLlama-2-7B teacher vs. llama-2-7b-logit-watermark-distill-kgw-k1-gamma0.25-delta2 student (test)
Similarity Score99.98
7
Model FingerprintingLlama-2 DPO 7B
Similarity Score99.94
7
Model Fingerprinting RobustnessFuseLLM 7b Distribution Merging Openllama-2-7b
Similarity Score79.53
7
Model Fingerprinting RobustnessFusellm-7b Distribution Merging - Mpt-7b
Similarity Score78.51
7
Showing 10 of 26 rows

Other info

Follow for update