Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale

About

Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill incomplete Cholesky factorization, enabling scalable comparison of candidate orderings. At the population level, we establish soundness for Markov equivalence class (MEC) recovery under no-cancellation and sparsest Markov representation assumptions, as well as robustness to ordering misspecification. Motivated by these guarantees, we introduce SCOPE, a sparse-Cholesky pipeline that provides a scalable implementation of the relaxed formulation. Experiments on synthetic and real datasets demonstrate that SCOPE matches the MEC recovery accuracy of substantially slower baselines, while achieving significantly reduced runtime and scaling to 10k variables.

Sunmin Oh, Sang-Yun Oh, Gunwoong Park• 2026

Related benchmarks

TaskDatasetResultRank
Causal Structure LearningTheoretical Analysis
Worst-case Cost2
9
Transcription Factor (TF) to target path discovery validation (one-step)TCGA breast cancer mRNA expression dataset GRCh38 reference genome (Additional random partitions of features)
Forward Proportion (Fwd)7.8
7
Transcription Factor (TF) to target path discovery validation (two-step)TCGA breast cancer mRNA expression dataset Additional random partitions of features GRCh38 reference genome
Forward Proportion10.8
7
CPDAG recoveryHigh-dimensional SEMs synthetic Uniform noise d=5 p=10^4
F1 Score64.2
4
CPDAG recoveryHigh-dimensional synthetic SEMs (Normal noise, d=5)
F1 Score58
4
CPDAG recoveryHigh-dimensional synthetic SEMs Normal noise, d=1
F1 Score96.5
4
CPDAG recoveryHigh-dimensional synthetic SEMs (Uniform noise, d=1) (p=10^4)
F1 Score94.8
4
Causal Discovery (One-step paths)TCGA mRNA expression 2012
Fwd Score7.8
3
Causal Discovery (Two-step paths)TCGA mRNA expression 2012
Fwd Score0.108
3
CPDAG recoverySynthetic linear SEMs p=10,000, d=1, n=0.5p, t-distributed noise
F1 Score99.4
2
Showing 10 of 13 rows

Other info

Follow for update