SLAM: Structural Linguistic Activation Marking for Language Models
About
LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding linguistic structure (e.g., voice, tense, clause order), and we causally steer those directions at generation time, leaving lexical sampling and semantics unconstrained. On Gemma-2 2B and 9B, SLAM achieves 100% detection accuracy with a quality cost of only 1-2 reward points - compared to 7.5-11.5 for KGW, EWD, and Unigram - with naturalness and diversity preserved at near-unwatermarked levels across both models. The trade-off is a complementary robustness profile: SLAM resists word-level edits but is vulnerable to paraphrase that restructures syntax (at a quality cost), the converse of token-distribution methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Watermark Robustness Analysis | Gemma-2-2B | Post-attack TPR100 | 49 | |
| Watermarking Attack Robustness | Gemma 9B v2 (test) | TPR100 | 49 | |
| Distribution-distance evaluation | Prompts 100 (evaluation) | Distinct-N (WM)88.8 | 14 | |
| Semantic similarity analysis | Gemma-2 within-prompt completions 2B | Cosine Distance0.35 | 8 | |
| Semantic similarity analysis | Gemma-2 9B within-prompt completions | Cosine Distance0.359 | 8 | |
| Watermark Detection Robustness | Gemma-2 9B Pre-trained (PT) (test) | TPR (Baseline)100 | 7 | |
| Watermarked text generation and detection | Gemma-2 9B Pre-trained | TPR100 | 7 | |
| Watermark Detection Robustness | Gemma-2 2B Pre-trained (PT) (test) | TPR (None)100 | 7 | |
| Watermarked text generation and detection | Gemma-2 2B Pre-trained | TPR100 | 7 | |
| Watermarked text generation and detection | Gemma-2 2B-IT | TPR99 | 1 |