Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SLAM: Structural Linguistic Activation Marking for Language Models

About

LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding linguistic structure (e.g., voice, tense, clause order), and we causally steer those directions at generation time, leaving lexical sampling and semantics unconstrained. On Gemma-2 2B and 9B, SLAM achieves 100% detection accuracy with a quality cost of only 1-2 reward points - compared to 7.5-11.5 for KGW, EWD, and Unigram - with naturalness and diversity preserved at near-unwatermarked levels across both models. The trade-off is a complementary robustness profile: SLAM resists word-level edits but is vulnerable to paraphrase that restructures syntax (at a quality cost), the converse of token-distribution methods.

Fabrice Harel-Canada, Amit Sahai• 2026

Related benchmarks

TaskDatasetResultRank
Watermark Robustness AnalysisGemma-2-2B
Post-attack TPR100
49
Watermarking Attack RobustnessGemma 9B v2 (test)
TPR100
49
Distribution-distance evaluationPrompts 100 (evaluation)
Distinct-N (WM)88.8
14
Semantic similarity analysisGemma-2 within-prompt completions 2B
Cosine Distance0.35
8
Semantic similarity analysisGemma-2 9B within-prompt completions
Cosine Distance0.359
8
Watermark Detection RobustnessGemma-2 9B Pre-trained (PT) (test)
TPR (Baseline)100
7
Watermarked text generation and detectionGemma-2 9B Pre-trained
TPR100
7
Watermark Detection RobustnessGemma-2 2B Pre-trained (PT) (test)
TPR (None)100
7
Watermarked text generation and detectionGemma-2 2B Pre-trained
TPR100
7
Watermarked text generation and detectionGemma-2 2B-IT
TPR99
1
Showing 10 of 11 rows

Other info

Follow for update