Functional Subspace Watermarking for Large Language Models

About

Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.

Zikang Ding, Junhao Li, Suling Wu, Junchi Yao, Hongbo Liu, Lijie Hu• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)5.91	2320
Commonsense Reasoning	HellaSwag	Accuracy69	1896
Question Answering	ARC-E	Accuracy77	523
Science Question Answering	ARC Challenge	Accuracy77	354
Language Modeling	Perplexity	Perplexity (PPL)5.91	149
Commonsense Reasoning	HellaSwag	HellaSwag Score69	55
Watermark Detection	Watermarked Model Generations	Detection Score6.09	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord