Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale
About
Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill incomplete Cholesky factorization, enabling scalable comparison of candidate orderings. At the population level, we establish soundness for Markov equivalence class (MEC) recovery under no-cancellation and sparsest Markov representation assumptions, as well as robustness to ordering misspecification. Motivated by these guarantees, we introduce SCOPE, a sparse-Cholesky pipeline that provides a scalable implementation of the relaxed formulation. Experiments on synthetic and real datasets demonstrate that SCOPE matches the MEC recovery accuracy of substantially slower baselines, while achieving significantly reduced runtime and scaling to 10k variables.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Causal Structure Learning | Theoretical Analysis | Worst-case Cost2 | 9 | |
| Transcription Factor (TF) to target path discovery validation (one-step) | TCGA breast cancer mRNA expression dataset GRCh38 reference genome (Additional random partitions of features) | Forward Proportion (Fwd)7.8 | 7 | |
| Transcription Factor (TF) to target path discovery validation (two-step) | TCGA breast cancer mRNA expression dataset Additional random partitions of features GRCh38 reference genome | Forward Proportion10.8 | 7 | |
| CPDAG recovery | High-dimensional SEMs synthetic Uniform noise d=5 p=10^4 | F1 Score64.2 | 4 | |
| CPDAG recovery | High-dimensional synthetic SEMs (Normal noise, d=5) | F1 Score58 | 4 | |
| CPDAG recovery | High-dimensional synthetic SEMs Normal noise, d=1 | F1 Score96.5 | 4 | |
| CPDAG recovery | High-dimensional synthetic SEMs (Uniform noise, d=1) (p=10^4) | F1 Score94.8 | 4 | |
| Causal Discovery (One-step paths) | TCGA mRNA expression 2012 | Fwd Score7.8 | 3 | |
| Causal Discovery (Two-step paths) | TCGA mRNA expression 2012 | Fwd Score0.108 | 3 | |
| CPDAG recovery | Synthetic linear SEMs p=10,000, d=1, n=0.5p, t-distributed noise | F1 Score99.4 | 2 |