Flow Map Language Models: One-step Language Modeling via Continuous Denoising

About

Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply degrades in the few-step regime, preventing a dramatic speedup in practice. Here, we show that language models based on continuous flows over one-hot token embeddings can outperform discrete diffusion in both quality and speed. Importantly, our continuous formulation defines a unique flow map that can be learned directly for efficient few-step inference, a structure we show is unavailable to discrete methods. In this setting, we show that both the flow and its associated flow map can be learned with simple cross-entropy objectives that respect the simplex geometry of the data, and we identify three distinct choices for flow map distillation whose performance we compare in practice. Using these insights, we build a flow language model (FLM), a continuous flow that matches state-of-the-art discrete diffusion baselines on the One Billion Words (LM1B) and OpenWebText (OWT) datasets. We then distill FLM into a flow map language model (FMLM), whose one-step generation exceeds the 8-step quality of recent few-step discrete diffusion language models. Our work challenges the widely-held hypothesis that discrete noising processes are necessary for generative modeling over discrete modalities and paves the way toward accelerated language modeling at scale. Code is available at https://github.com/david3684/flm.

Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Seunghoon Hong, Nicholas M. Boffi, Jinwoo Kim• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	LM1B	PPL (Generalized)90.9	93
Text Generation	LM1B (test)	Entropy4.16	85
Language Modeling	OWT	Gen. PPL62.23	78
Language Modeling	LM1B (val)	--	67
Unconditional Generation	LM1B sequence length 128	Generation Perplexity (PPL)142.5	43
Text Generation	OWT	GPT2 Perplexity108.2	41
Unconditional Generation	OpenWebText L=1024 (test)	Generation Perplexity55.6	40
Text Generation	LM1B	Perplexity (PPL)98.76	24
Language Modeling	OpenWebText (OWT) (test)	--	13
Language Modeling	OWT (val)	--	7

Showing 10 of 11 rows

Other info

GitHub

Follow for update

@wizwand_team Discord