Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A versatile informative diffusion model for single-cell ATAC-seq data generation and analysis

About

The rapid advancement of single-cell ATAC sequencing (scATAC-seq) technologies holds great promise for investigating the heterogeneity of epigenetic landscapes at the cellular level. The amplification process in scATAC-seq experiments often introduces noise due to dropout events, which results in extreme sparsity that hinders accurate analysis. Consequently, there is a significant demand for the generation of high-quality scATAC-seq data in silico. Furthermore, current methodologies are typically task-specific, lacking a versatile framework capable of handling multiple tasks within a single model. In this work, we propose ATAC-Diff, a versatile framework, which is based on a latent diffusion model conditioned on the latent auxiliary variables to adapt for various tasks. ATAC-Diff is the first diffusion model for the scATAC-seq data generation and analysis, composed of auxiliary modules encoding the latent high-level variables to enable the model to learn the semantic information to sample high-quality data. Gaussian Mixture Model (GMM) as the latent prior and auxiliary decoder, the yield variables reserve the refined genomic information beneficial for downstream analyses. Another innovation is the incorporation of mutual information between observed and hidden variables as a regularization term to prevent the model from decoupling from latent variables. Through extensive experiments, we demonstrate that ATAC-Diff achieves high performance in both generation and analysis tasks, outperforming state-of-the-art models.

Lei Huang, Lei Xiong, Na Sun, Zunpeng Liu, Ka-Chun Wong, Manolis Kellis• 2024

Related benchmarks

TaskDatasetResultRank
DenoisingPBMC10k
SCC0.863
6
Conditional scATAC-seq GenerationForebrain
SCC0.688
3
Conditional scATAC-seq GenerationHematopoiesis
SCC0.85
3
Conditional scATAC-seq GenerationPBMC10k
SCC0.846
3
ImputationHematopoiesis
SCC0.892
3
Unconditional scATAC-seq GenerationForebrain
SCC0.925
3
Unconditional scATAC-seq GenerationHematopoiesis
SCC0.927
3
Unconditional scATAC-seq GenerationPBMC 10k
SCC0.964
3
DenoisingForebrain
SCC0.718
3
DenoisingHematopoiesis
SCC0.84
3
Showing 10 of 11 rows

Other info

Follow for update