Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation

About

We propose SDAR, a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion. Instead of costly end-to-end diffusion training, SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation. During inference, SDAR generates sequences autoregressively across blocks for global coherence while decoding all tokens within each block in parallel via a discrete diffusion process. Extensive experiments show that AR models remain substantially more compute-efficient than masked diffusion models, providing a strong foundation for adaptation. Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation. Scaling studies across dense and Mixture-of-Experts architectures confirm that SDAR scales without compromise: larger models exhibit stronger robustness to block size and decoding thresholds, yielding greater speedups without accuracy loss. Beyond efficiency, SDAR demonstrates enhanced reasoning and domain adaptability. Our 30B MoE model surpasses its AR counterpart on challenging scientific reasoning benchmarks such as GPQA and ChemBench, and gains further improvements under test-time scaling methods like majority voting and pass@k. Together, these results establish SDAR as a practical paradigm that combines the strengths of autoregression and diffusion for scalable, high-throughput reasoning.

Shuang Cheng, Yihan Bian, Dawei Liu, Linfeng Zhang, Qian Yao, Zhongbo Tian, Wenhai Wang, Qipeng Guo, Kai Chen, Biqing Qi, Bowen Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Pass@178.7
850
Language UnderstandingMMLU
Accuracy78.6
756
Mathematical ReasoningMATH
Accuracy78.6
643
Code GenerationHumanEval (test)--
444
Mathematical ReasoningMATH500 (test)
Accuracy54.4
381
Mathematical ReasoningGSM8K
Accuracy (GSM8K)91.3
358
Instruction FollowingIFEval
Accuracy (0-100)61.4
292
Code GenerationMBPP (test)--
276
Mathematical ReasoningAIME 25
Accuracy14.79
201
Mathematical ReasoningGSM8K
Speed Up (x)1
177
Showing 10 of 52 rows

Other info

Follow for update