Directed Graphical Models and Causal Discovery for Zero-Inflated Data

About

Modern RNA sequencing technologies provide gene expression measurements from single cells that promise refined insights on regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their 0/1 indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model to real single-cell RNA-seq data on T helper cells, and show simulated experiments that validate the identifiability and graph estimation methods in practice.

Shiqing Yu, Mathias Drton, Ali Shojaie• 2020

Related benchmarks

Task	Dataset	Result	Rank
DAG structure learning	Simulated zero-inflated count data (ER graph) D=50 (test)	TPR0.016		11
DAG structure learning	Simulated zero-inflated count data BA graph D=50 (test)	TPR4.5		11

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord