WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection

About

Embedding as a Service (EaaS) has become a widely adopted solution, which offers feature extraction capabilities for addressing various downstream tasks in Natural Language Processing (NLP). Prior studies have shown that EaaS can be prone to model extraction attacks; nevertheless, this concern could be mitigated by adding backdoor watermarks to the text embeddings and subsequently verifying the attack models post-publication. Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. Our defense approach, WARDEN, notably increases the stealthiness of watermarks and has been empirically shown to be effective against CSE attack.

Anudeex Shetty, Yue Teng, Ke He, Qiongkai Xu• 2024

Related benchmarks

Task	Dataset	Result
Text Classification	AG-News	Accuracy93.86	248
Text Classification	SST2	Accuracy93.5	71
Text Classification	MIND	Accuracy77.43	48
Text Classification	AGNews	Accuracy93.48	38
Text Classification	SST-2	Accuracy93.5	24
Text Classification	Enron Spam	Accuracy (ACC)95.88	21
Sentiment Analysis	SST2	Accuracy94.04	20
Text Classification	SST2	Accuracy90.46	10
Text Classification	ENRON	Accuracy95.56	9
Watermark Detection	MIND	Delta Cosine (%)0.55	7

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord