Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching
About
Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state similarity and can be optimized via minibatch strategies. We show that such methods can reduce the number of transitions up to 32 times (1024 to 32) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity without compromising diversity, particularly when utilizing minibatch Optimal Transport.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 | Perplexity (PPL)42 | 841 | |
| Language Modeling | PTB | Perplexity111.6 | 650 | |
| Language Modeling | WikiText-103 | PPL41.64 | 146 | |
| Language Modeling | LAMBADA | Perplexity53.19 | 99 | |
| Text Generation | OpenWebText | Perplexity132.6 | 66 | |
| Language Modeling | LM1B | Perplexity77.87 | 7 |