Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets

About

Standard autoregressive language models generate text token-by-token from a fixed vocabulary, inducing a tree-structured state space when viewing token sampling as an action, which limits flexibility and expressiveness. Recent work introduces dynamic vocabulary by sampling retrieved text spans but overlooks that the same sentence can be composed of spans of varying lengths, lacking explicit modeling of the directed acyclic graph (DAG) state space. This leads to restricted exploration of compositional paths and is biased toward the chosen path. Generative Flow Networks (GFlowNets) are powerful for efficient exploring and generalizing over state spaces, particularly those with a DAG structure. However, prior GFlowNets-based language models operate at the token level and remain confined to tree-structured spaces, limiting their potential. In this work, we propose Flow of SpanS (FOSS), a principled GFlowNets framework for span generation. FoSS constructs a dynamic span vocabulary by segmenting the retrieved text flexibly, ensuring a DAG-structured state space, which allows GFlowNets to explore diverse compositional paths and improve generalization. With specialized reward models, FoSS generates diverse, high-quality text. Empirically, FoSS improves MAUVE scores by up to 12.5% over Transformer on text generation and achieves 3.5% gains on knowledge-intensive tasks, consistently outperforming state-of-the-art methods. Scaling experiments further demonstrate FoSS benefits from larger models, more data, and richer retrieval corpora, retaining its advantage over strong baselines.

Bo Xue, Yunchong Song, Fanghao Shao, Xuekai Zhu, Lin Chen, Luoyi Fu, Xinbing Wang, Zhouhan Lin• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringMedQA-USMLE (test)
Accuracy25.27
101
Question AnsweringARC Challenge (test)
Accuracy24.63
63
Question AnsweringMedMCQA (test)--
48
Open-ended generationWikiText-103 (test)
MAUVE0.3165
26
Open-ended Text GenerationLaw-MT Out of Domain (test)
MAUVE32.17
16
Open-ended Text GenerationScaling Data Store
MAUVE33.79
12
Question AnsweringTruthfulQA (test)
Accuracy30.45
4
Text Quality AssessmentWikiText-103 In Domain (test)
GPT-4 Preference Ratio (Better)0.67
4
Text Quality AssessmentLaw-MT Out of Domain (test)
Preference Better0.75
4
Showing 9 of 9 rows

Other info

Follow for update