Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Continuous diffusion for categorical data

About

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.

Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, R\'emi Leblond, Will Grathwohl, Jonas Adler• 2022

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De '14
BLEU19.7
89
Machine TranslationWMT De-En 14
BLEU25.4
33
Machine TranslationWMT14 DE-EN
SacreBLEU25.4
13
Machine TranslationWMT En-De '14
SacreBLEU19.7
12
Showing 4 of 4 rows

Other info

Follow for update