Flow Matching for Count Data
About
High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Generative Modeling | 2D simulation Gamma-Poisson mixture | W2 Distance2.879 | 8 | |
| Unconditional Generation | Dentate Gyrus (test) | W2 Score20.456 | 7 | |
| Conditional Generation | PCx (held-out) | Mean RMSE1.082 | 5 | |
| Conditional Generation | hc-3 linear-track session | RMSEµ0.026 | 5 |