Discrete Flow Matching
About
Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers several key contributions:(i) it works with a general family of probability paths interpolating between source and target distributions; (ii) it allows for a generic formula for sampling from these probability paths using learned posteriors such as the probability denoiser ($x$-prediction) and noise-prediction ($\epsilon$-prediction); (iii) practically, focusing on specific probability paths defined with different schedulers improves generative perplexity compared to previous discrete diffusion and flow models; and (iv) by scaling Discrete Flow Matching models up to 1.7B parameters, we reach 6.7% Pass@1 and 13.4% Pass@10 on HumanEval and 6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks. Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion, significantly closing the gap between autoregressive models and discrete flow models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unconditional Image Generation | CIFAR-10 unconditional | FID3.63 | 209 | |
| Text Generation | OpenWebText | Perplexity146.5 | 142 | |
| Text Generation | LM1B (test) | Entropy3.79 | 85 | |
| Image Generation | CIFAR-10 (train/test) | FID3.63 | 78 | |
| Molecule Generation | GuacaMol | Validity86.6 | 28 | |
| Unconditional Image Generation | MNIST Binary | FID34.42 | 25 | |
| Text Generation | WikiText-103 | Perplexity69.06 | 23 | |
| Molecule Generation | MOSES | Validity88.3 | 19 | |
| Unconditional Generation | FineWeb-Edu Mask source 170M-parameter (train) | Entropy8.5 | 17 | |
| Unconditional Generation | FineWeb-Edu Uniform source 170M-parameter (train) | Entropy7.8 | 17 |