Machine Unlearning for Masked Diffusion Language Models

About

Recent masked diffusion language models (MDLMs), such as LLaDA and Dream, have achieved performance comparable to autoregressive large language models. Unlike autoregressive models, which generate text sequentially, MDLMs generate text by iteratively denoising masked positions in parallel. During fine-tuning, MDLMs learn to recover responses from masked response states conditioned on a prompt, thereby shifting their predictions from a prompt-masked unconditional distribution toward a prompt-conditional distribution. Despite this distinct generative and fine-tuning mechanism, machine unlearning for MDLMs remains largely unexplored. In this paper, we propose Masked Diffusion Unlearning (MDU), the first unlearning framework for MDLMs, by revisiting the process of learning specific knowledge in terms of diffusion. Specifically, MDU minimizes a forward KL divergence from the prompt-conditional prediction to a prompt-masked unconditional anchor at every masked response position, with a temperature scaling parameter to control the privacy-utility trade-off. Our empirical results on standard benchmarks and MDLM backbones show that MDU achieves high unlearning performance compared to existing LLM unlearning methods. Code is available at https://github.com/leegeoru/MDU.

Georu Lee, Seungwon Jeong, Hoki Kim, Jinseong Park, Woojin Lee• 2026

Related benchmarks

Task	Dataset	Result
Machine Unlearning	TOFU 1.0 (Retain Set)	ROUGE-L93.1	48
Machine Unlearning	TOFU 1.0 (Real Author)	ROUGE-L64.5	45
Machine Unlearning	TOFU World Fact 1.0	ROUGE-L0.848	34
Unlearning	TOFU forget10 (forget)	rL0.284	24
Machine Unlearning	RWKU	Fidelity L126.2	22

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord