SemBind: Binding Diffusion Watermarks to Semantics Against Black-Box Forgery Attacks

About

Latent-based watermarks, integrated into the generation process of latent diffusion models (LDMs), simplify detection and attribution of generated images. However, recent black-box forgery attacks, where an attacker needs at least one watermarked image and black-box access to the provider's model, can embed the provider's watermark into images not produced by the provider, posing outsized risk to provenance and trust. We propose SemBind, the first defense framework for latent-based watermarks that resists black-box forgery by binding latent signals to image semantics via a learned semantic masker. Trained with contrastive learning, the masker yields near-invariant codes for the same prompt and near-orthogonal codes across prompts; these codes are reshaped and permuted to modulate the target latent before any standard latent-based watermark. SemBind is generally compatible with existing latent-based watermarking schemes and keeps image quality essentially unchanged, while a simple mask-ratio parameter offers a tunable trade-off between anti-forgery strength and robustness. Across four mainstream latent-based watermark methods, our SemBind-enabled anti-forgery variants markedly reduce false acceptance under black-box forgery while providing a controllable robustness-security balance.

Xin Zhang, Zijin Yang, Kejiang Chen, Linfeng Ma, Weiming Zhang, Nenghai Yu• 2026

Related benchmarks

Task	Dataset	Result
Imprinting Attack	COCO	Detection Rate0.00e+0	54
Imprint Forgery Attack	SDP prompt v1 (val)	Detection Rate49	48
Reprompt Forgery Attack	SDP prompt (val)	Detection Rate60	16
Reprompt Forgery Attack	COCO prompt (val)	Detection Rate50	16
Reprompt Forgery Attack	FredZhang7/stable-diffusion-prompts SD 2.1 attacker 2.47M	Detection Rate56	8
Reprompt Forgery Attack	FredZhang7/stable-diffusion-prompts-2.47M SD 1.5 attacker	Detection Rate50	8
Imprinting Attack	SDP	Detection Rate0.00e+0	6
Reprompting Attack	COCO	Detection Rate1	2
Reprompting Attack	SDP	Detection Rate0.00e+0	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord