Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

About

Diffusion language models (DLMs) have recently emerged as an alternative modeling paradigm to autoregressive (AR) language models, enabling parallel generation and bidirectional context modeling. Yet their security implications, particularly their vulnerability to backdoor attacks, remain underexplored. We propose BadDLM, a unified framework for studying backdoor attacks against DLMs with diverse targets. We introduce a trigger-aware training objective that emphasizes target-relevant positions in poisoned samples, and theoretically prove that this objective is equivalent to training under an induced forward masking distribution. Unlike backdoors in autoregressive models, which typically manipulate next-token prediction, this characterization indicates that BadDLM can implant backdoors by exploiting the forward masking process. We instantiate BadDLM across different target levels: concept injection (BadDLM_Concept), semantic attribute steering (BadDLM_Attribute), alignment bypass (BadDLM_Align), and code payload injection (BadDLM_Payload). Experiments on mainstream open-source DLMs show that BadDLM achieves strong attack effectiveness across diverse targets while largely preserving benign utility, and remains effective against defenses designed for AR backdoors. Our findings expose a new class of security risks in diffusion-based language generation and call for defenses tailored to DLM denoising dynamics.

Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao, Fanyu Meng, Yan Zeng, Yuejian Fang, Yinpeng Dong, Jiaheng Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Utility PreservationMMLU 5-shot (test)
Utility Score67.1
16
Backdoor AttackGPT-5-generated shopping requests
ASR91.2
5
Backdoor AttackWiki topic evaluation requests
ASR94.5
5
Backdoor AttackCodeAlpaca 20k code generation instructions
ASR94.8
5
Backdoor Attackshopping requests GPT-5-generated (test)
Attack Success Rate (ASR)94.1
5
Backdoor AttackWiki topic evaluation requests (test)
ASR93.2
5
Backdoor AttackCodeAlpaca-20k (test)
ASR90.5
5
Backdoor AttackAdvBench
ASR91.8
4
Backdoor AttackAdvBench (test)
ASR91.3
4
Showing 9 of 9 rows

Other info

Follow for update