Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

About

Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply degrades in the few-step regime, preventing a dramatic speedup in practice. Here, we show that language models based on continuous flows over one-hot token embeddings can outperform discrete diffusion in both quality and speed. Importantly, our continuous formulation defines a unique flow map that can be learned directly for efficient few-step inference, a structure we show is unavailable to discrete methods. In this setting, we show that both the flow and its associated flow map can be learned with simple cross-entropy objectives that respect the simplex geometry of the data, and we identify three distinct choices for flow map distillation whose performance we compare in practice. Using these insights, we build a flow language model (FLM), a continuous flow that matches state-of-the-art discrete diffusion baselines on the One Billion Words (LM1B) and OpenWebText (OWT) datasets. We then distill FLM into a flow map language model (FMLM), whose one-step generation exceeds the 8-step quality of recent few-step discrete diffusion language models. Our work challenges the widely-held hypothesis that discrete noising processes are necessary for generative modeling over discrete modalities and paves the way toward accelerated language modeling at scale. Code is available at https://github.com/david3684/flm.

Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Seunghoon Hong, Nicholas M. Boffi, Jinwoo Kim• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingOWT
Gen. PPL62.23
61
Language ModelingLM1B
PPL (Generalized)90.9
55
Language ModelingLM1B (val)--
55
Text GenerationOWT
GPT2 Perplexity108.2
41
Text GenerationLM1B
Perplexity (PPL)98.76
24
Language ModelingOpenWebText (OWT) (test)--
8
Language ModelingOWT (val)--
7
Showing 7 of 7 rows

Other info

GitHub

Follow for update