Flow Matching for Count Data

About

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

Ganchao Wei, John Pearson• 2026

Related benchmarks

Task	Dataset	Result
Generative Modeling	2D simulation Gamma-Poisson mixture	W2 Distance2.879	8
Unconditional Generation	Dentate Gyrus (test)	W2 Score20.456	7
Conditional Generation	PCx (held-out)	Mean RMSE1.082	5
Conditional Generation	hc-3 linear-track session	RMSEµ0.026	5

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord