Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

About

Text watermarking technology aims to tag and identify content produced by large language models (LLMs) to prevent misuse. In this study, we introduce the concept of cross-lingual consistency in text watermarking, which assesses the ability of text watermarks to maintain their effectiveness after being translated into other languages. Preliminary empirical results from two LLMs and three watermarking methods reveal that current text watermarking technologies lack consistency when texts are translated into various languages. Based on this observation, we propose a Cross-lingual Watermark Removal Attack (CWRA) to bypass watermarking by first obtaining a response from an LLM in a pivot language, which is then translated into the target language. CWRA can effectively remove watermarks, decreasing the AUCs to a random-guessing level without performance loss. Furthermore, we analyze two key factors that contribute to the cross-lingual consistency in text watermarking and propose X-SIR as a defense method against CWRA. Code: https://github.com/zwhe99/X-SIR.

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, Rui Wang• 2024

Related benchmarks

Task	Dataset	Result
Translation Attack	17 supported languages LLaMA-3.2 1B	AUC82.5	68
Watermark Detection	Aya-23 8B 10 unsupported languages	AUC0.796	40
Paraphrase Attack Robustness	BookSum	AUC96.01	20
Spoofing attack traceability	RTP-LX (test)	AUC58.2	20
Paraphrase Attack Robustness	C4 RealNewsLike	AUC0.9224	20
Spoofing Attack Robustness	BookSum	AUC0.4921	20
Spoofing attack traceability	RealToxicityPrompts (test)	AUC54.41	20
Spoofing Attack Robustness	C4 RealNewsLike	AUC0.5069	20
Text Summarization	Text Summarization	ROUGE-L17.33	16
Question Answering	Question Answering	ROUGE-10.1823	12

Showing 10 of 46 rows

Other info

Follow for update

@wizwand_team Discord