Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

About

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (127M-355M-7B) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions https://github.com/HKUNLP/DiffuLLaMA.

Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningPIQA
Accuracy63.3
757
Math ReasoningGSM8K
Accuracy58.5
254
Commonsense ReasoningSIQA
Accuracy43.2
168
Commonsense ReasoningWino
Accuracy56.4
146
Commonsense ReasoningWinoGrande
Accuracy52.6
103
Common Sense ReasoningPIQA
Accuracy59.6
100
Common Sense ReasoningHSWAG
Accuracy0.587
52
Common Sense ReasoningHellaSwag
Accuracy (acc_n)37.2
47
Question AnsweringTriQA
Accuracy18.5
47
Generative RecommendationBeauty
R@107.18
28
Showing 10 of 24 rows

Other info

Follow for update