Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Modeling with Hyperspherical Flows

About

Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized distribution, which is less expressive than AR. Recent Flow Language Models (FLMs) apply continuous flows to language, transporting noise to data with a deterministic ODE that avoids factorized sampling. FLMs operate on one-hot vectors whose dimension scales with the vocabulary size, making FLMs costly to train. Moreover, since all distinct one-hot embeddings are equidistant in $\ell_2$, adding Gaussian noise does not have a clear semantic interpretation (unlike images, where Gaussian noise progressively degrades structure). We introduce $\mathbb{S}$-FLM, a latent FLM in the hypersphere. $\mathbb{S}$-FLM generates sequences by rotating vectors in $\mathbb{S}^{d-1}$ along a velocity field learned with cross-entropy, avoiding the overhead of materializing one-hot vectors. Previous FLMs match AR in Generative Perplexity (Gen.\ PPL), but samples with high likelihood are not necessarily correct in verifiable domains such as math and code. $\mathbb{S}$-FLM substantially improves continuous flow language models on large-vocabulary reasoning and closes the gap to masked diffusion under standard-temperature sampling ($T=1$), while a gap remains under optimized low-temperature ($T=0.1$) decoding.

Justin Deschenaux, Caglar Gulcehre• 2026

Related benchmarks

TaskDatasetResultRank
Sudoku Puzzle SolvingSudoku Med. 35/81 digits visible (test)
Exact Match Accuracy85.2
10
Sudoku Puzzle SolvingSudoku Easy 40/81 digits visible (test)
Exact Match Accuracy94.8
10
Sudoku Puzzle SolvingSudoku Hard 30/81 digits visible (test)
Exact Match Accuracy45
10
Showing 3 of 3 rows

Other info

Follow for update