Estimating high-dimensional directed acyclic graphs with the PC-algorithm
About
We consider the PC-algorithm Spirtes et. al. (2000) for estimating the skeleton of a very high-dimensional acyclic directed graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible for sparse problems with many nodes, i.e. variables, and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n^a) for any 0<a<infinity. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We empirically demonstrate the PC-algorithm for simulated data and argue that the algorithm is rather insensitive to the choice of its single tuning parameter.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Causal Structure Learning | Theoretical Analysis | Worst-case Cost2 | 9 | |
| Transcription Factor (TF) to target path discovery validation (one-step) | TCGA breast cancer mRNA expression dataset GRCh38 reference genome (Additional random partitions of features) | Forward Proportion (Fwd)2.8 | 7 | |
| Transcription Factor (TF) to target path discovery validation (two-step) | TCGA breast cancer mRNA expression dataset Additional random partitions of features GRCh38 reference genome | Forward Proportion5.1 | 7 | |
| Causal Discovery (One-step paths) | TCGA mRNA expression 2012 | Fwd Score2.8 | 3 | |
| Causal Discovery (Two-step paths) | TCGA mRNA expression 2012 | Fwd Score0.051 | 3 |