Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching

About

Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state similarity and can be optimized via minibatch strategies. We show that such methods can reduce the number of transitions up to 32 times (1024 to 32) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity without compromising diversity, particularly when utilizing minibatch Optimal Transport.

Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman• 2024

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)42	1624
Language Modeling	PTB	Perplexity111.6	1034
Language Modeling	WikiText-103	PPL41.64	189
Language Modeling	LAMBADA	Perplexity53.19	150
Text Generation	OpenWebText	Perplexity132.6	86
Language Modeling	LM1B	Perplexity77.87	22

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord