Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs

About

Unlike autoregressive models, which generate one token at a time, dLLMs denoise a chunk of [MASK] tokens jointly and sample one or more tokens per step; despite enabling parallel decoding, this process incurs substantial computational cost due to the large chunk size of masked tokens. We observe that much of this cost is spent on repeatedly processing the preceding context and many [MASK] tokens with the same feature representations, indicating considerable computational redundancy. In this work, we revisit dLLM's redundancy from the perspective of [MASK] tokens. Through systematic analysis, we verify the redundancy of [MASK] tokens while revealing their critical role in providing structural information. Guided by these findings, we propose position-preserving [MASK] token compression and terminal-aware augmentation. By compressing redundant [MASK] computation, this approach accelerates decoding and further provides a natural extension toward context-folding-like long-context scaling under limited input-length constraints for full-sequence dLLMs such as LLaDA-8B-Instruct and LLaDA-1.5. Moreover, for block dLLMs such as LLaDA2.0-mini, it augments the context with a protected terminal [MASK] token to enhance generation quality with negligible overhead.

Junyi Wu, Tianchen Zhao, Shaoqiu Zhang, Linfeng Zhang, Guohao Dai, Yu Wang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH 500
Accuracy (Acc)49.6
543
Mathematical ReasoningASDIV
Accuracy0.8672
263
Mathematical ReasoningCountdown
Accuracy65.62
252
Code GenerationHumanEval
Accuracy41.46
217
Mathematical ReasoningGSM8K--
204
CodeHumanEval
HumanEval Accuracy77.44
109
Instruction FollowingIFEval
Accuracy (IFEval)58.04
86
InstructionIFEval
Score85.03
17
Long-form generationLongWriter-Bench
Success Rate (Sl) [0, 500)94.9
4
Showing 9 of 9 rows

Other info

Follow for update