Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

About

The protection of Intellectual Property (IP) for Large Language Models (LLMs) has become a critical concern as model theft and unauthorized commercialization escalate. While adversarial fingerprinting offers a promising black-box solution for ownership verification, existing methods suffer from significant limitations: they are fragile against downstream model modifications, sensitive to system prompt variations, and easily detectable due to high-perplexity input patterns. In this paper, we propose \textbf{SRAF}, a stealthy and robust adversarial fingerprinting framework. SRAF employs a synergistic joint optimization strategy across homologous model variants and diverse chat templates, forcing the fingerprint to anchor onto the invariant intrinsic comprehension features of the model family. Furthermore, we introduce a Perplexity Hiding technique that embeds adversarial perturbations within Markdown tables, effectively aligning the prompt's statistics with natural language to evade perplexity-based detection. Extensive experiments on the Llama-2 model family demonstrate that SRAF significantly enhances robustness against fine-tuning, alignment, pruning, merging, and input perturbations while maintaining exceptional stealthiness and low false-positive rates, offering a practical and resilient black-box solution for LLM ownership verification.

Zhebo Wang, Zhenhua Xu, Maike Li, Wenpeng Xing, Chunqiang Hu, Chen Zhi, Meng Han• 2025

Related benchmarks

TaskDatasetResultRank
Ownership VerificationQwen 7B 2.5 (Anchor)
FSR0.00e+0
8
Ownership VerificationQwen-7B-Instruct SFT Variant 2.5
FSR0.00e+0
8
Ownership VerificationMath-TIES Merged Variant
FSR84
8
Ownership VerificationWanda 10% Pruned Variant
FSR0.00e+0
8
Model FingerprintingSystem Prompt Variations
Fastchat Score96
6
Model FingerprintingCharacter Dropping
Performance (-5% Dropped)26
6
Ownership VerificationLlama-2-7B Random Pruning, 5% sparsity
False Success Rate (FSR)0.00e+0
6
Ownership VerificationLlama-2-7B Random Pruning, 10% sparsity
FSR0.00e+0
6
Ownership VerificationLlama-2-7B Taylor Pruning 5% sparsity
FSR0.00e+0
6
Ownership VerificationLlama-2-7B Taylor Pruning, 10% sparsity
FSR0.00e+0
6
Showing 10 of 13 rows

Other info

Follow for update