Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generalized Discrete Diffusion with Self-Correction

About

Self-correction is an effective technique for maintaining parallel sampling in discrete diffusion models with minimal performance degradation. Prior work has explored self-correction at inference time or during post-training; however, such approaches often suffer from limited generalization and may impair reasoning performance. GIDD pioneers pretraining-based self-correction via a multi-step BERT-style uniform-absorbing objective. However, GIDD relies on a continuous interpolation-based pipeline with opaque interactions between uniform transitions and absorbing masks, which complicates hyperparameter tuning and hinders practical performance. In this work, we propose a Self-Correcting Discrete Diffusion (SCDD) model to reformulate pretrained self-correction with explicit state transitions and learn directly in discrete time. Our framework also simplifies the training noise schedule, eliminates a redundant remasking step, and relies exclusively on uniform transitions to learn self-correction. Experiments at the GPT-2 scale demonstrate that our method enables more efficient parallel decoding while preserving generation quality.

Linxuan Wang, Ziyi Wang, Yikun Bai, Wei Deng, Guang Lin, Qifan Song• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingOWT
Gen. PPL55.7
61
Language ModelingLM1B
PPL (Generalized)102.6
55
Language ModelingLM1B (val)
Perplexity39.16
55
Language ModelingOpenWebText (OWT) (val)
Perplexity28.41
42
Language ModelingLanguage Modeling Benchmarks (ARC, BoolQ, Hellaswag, OBQA, PIQA, WinoG) zero-shot
ARC-E Accuracy26.64
5
Showing 5 of 5 rows

Other info

Follow for update