Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Count Bridges enable Modeling and Deconvolving Transcriptomic Data

About

Many modern biological assays, including RNA sequencing, yield integer-valued counts that reflect the number of molecules detected. These measurements are often not at the desired resolution: while the unit of interest is typically a single cell, many measurement technologies produce counts aggregated over sets of cells. Although recent generative frameworks such as diffusion and flow matching have been extended to non-Euclidean and discrete settings, it remains unclear how best to model integer-valued data or how to systematically deconvolve aggregated observations. We introduce Count Bridges, a stochastic bridge process on the integers that provides an exact, tractable analogue of diffusion-style models for count data, with closed-form conditionals for efficient training and sampling. We extend this framework to enable direct training from aggregated measurements via an Expectation-Maximization-style approach that treats unit-level counts as latent variables. We demonstrate state-of-the-art performance on integer distribution matching benchmarks, comparing against flow matching and discrete flow matching baselines across various metrics. We then apply Count Bridges to two large-scale problems in biology: modeling single-cell gene expression data at the nucleotide resolution, with applications to deconvolving bulk RNA-seq, and resolving multicellular spatial transcriptomic spots into single-cell count profiles. Our methods offer a principled foundation for generative modeling and deconvolution of biological count data across scales and modalities.

Nic Fishman, Gokul Gowri, Tanush Kumar, Jiaqi Lu, Valentin de Bortoli, Jonathan S. Gootenberg, Omar Abudayyeh• 2026

Related benchmarks

TaskDatasetResultRank
Cell-type proportion deconvolutionPBMC scRNA-seq synthetic bulk (held-out 10% of patients)
JSD0.113
3
Cell-type deconvolutionMERFISH
JSD0.229
3
Gene expression count profile deconvolutionMERFISH mouse brain
MMD0.203
2
Gene expression count profile deconvolutionPBMC (10% held-out individuals)
MMD0.446
2
gene expression predictionPBMC scRNA-seq (test)
Bulk MSE0.601
2
Spatial transcriptomic deconvolutionMERFISH mouse brain dataset synthetic spot-level aggregates Vizgen 2021
JSD0.231
2
Showing 6 of 6 rows

Other info

Follow for update