VDLM: Variable Diffusion LMs via Robust Latent-to-Text Rendering
About
Autoregressive language models decode left-to-right with irreversible commitments, limiting revision during multi-step reasoning. We propose \textbf{VDLM}, a modular variable diffusion language model that separates semantic planning from text rendering. VDLM applies LLaDA-style masked diffusion over semantic variable embeddings to enable iterative refinement in latent space, then post-trains the planner with trajectory-aware optimization using embedding-space rewards and values, avoiding text decoding inside the RL loop. To convert planned embeddings back to text, we use a \textbf{Vec2Text} renderer and introduce \textbf{embedding perturbations} to robustify decoding under planner noise. Across nine benchmarks spanning general reasoning, math, and code, VDLM is competitive in pre-training and yields substantial post-training improvements on long-form generation tasks, outperforming other baselines. These results highlight the effectiveness of embedding-space post-training and robust latent-to-text rendering for diffusion language modeling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval | -- | 850 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy89.8 | 797 | |
| Language Understanding | MMLU | Accuracy71.4 | 756 | |
| Code Generation | HumanEval (test) | -- | 444 | |
| Mathematical Reasoning | MATH (test) | Overall Accuracy62.4 | 433 | |
| Physical Commonsense Reasoning | PIQA | Accuracy74.2 | 329 | |
| Science Reasoning | GPQA | Accuracy25.6 | 218 | |
| Science Question Answering | ARC-C | Accuracy54.4 | 127 | |
| Truthfulness Evaluation | TruthfulQA | Accuracy50.2 | 93 | |
| Logical reasoning | BBH | Accuracy54.5 | 93 |