Is Multilingual LLM Watermarking Truly Multilingual? Scaling Robustness to 100+ Languages via Back-Translation
About
Multilingual watermarking aims to make large language model (LLM) outputs traceable across languages, yet current methods still fall short. Despite claims of cross-lingual robustness, they are evaluated only on high-resource languages. We show that existing multilingual watermarking methods are not truly multilingual: they fail to remain robust under translation attacks in medium- and low-resource languages. We trace this failure to semantic clustering, which fails when the tokenizer vocabulary contains too few full-word tokens for a given language. To address this, we introduce STEAM, a detection method that uses Bayesian optimisation to search among 133 candidate languages for the back-translation that best recovers the watermark strength. It is compatible with any watermarking method, robust across different tokenizers and languages, non-invasive, and easily extendable to new languages. With average gains of +0.23 AUC and +37% TPR@1%, STEAM provides a scalable approach toward fairer watermarking across the diversity of languages.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Translation Attack | 17 supported languages LLaMA-3.2 1B | AUC97.9 | 68 | |
| Watermark Detection | Aya-23 8B 10 unsupported languages | AUC0.974 | 40 | |
| Multilingual Watermarking Detection | Italian unsupported language | AUC96.4 | 4 | |
| Multilingual Watermarking Detection | Spanish (es) unsupported language | AUC0.961 | 4 | |
| Multilingual Watermarking Detection | Portuguese (pt) unsupported language | AUC96.6 | 4 | |
| Multilingual Watermarking Detection | Polish (pl) unsupported language | AUC0.965 | 4 | |
| Multilingual Watermarking Detection | Dutch (nl) unsupported language | AUC95.7 | 4 | |
| Multilingual Watermarking Detection | Croatian (hr) unsupported language | AUC96.6 | 4 | |
| Multilingual Watermarking Detection | Czech (cs) unsupported language | AUC95.6 | 4 | |
| Multilingual Watermarking Detection | Danish (da) unsupported language | AUC96.2 | 4 |