SNAP: Sequential Non-Ancestor Pruning for Targeted Causal Effect Estimation With an Unknown Graph
About
Causal discovery can be computationally demanding for large numbers of variables. If we only wish to estimate the causal effects on a small subset of target variables, we might not need to learn the causal graph for all variables, but only a small subgraph that includes the targets and their adjustment sets. In this paper, we focus on identifying causal effects between target variables in a computationally and statistically efficient way. This task combines causal discovery and effect estimation, aligning the discovery objective with the effects to be estimated. We show that definite non-ancestors of the targets are unnecessary to learn causal relations between the targets and to identify efficient adjustments sets. We sequentially identify and prune these definite non-ancestors with our Sequential Non-Ancestor Pruning (SNAP) framework, which can be used either as a preprocessing step to standard causal discovery methods, or as a standalone sound and complete causal discovery algorithm. Our results on synthetic and real data show that both approaches substantially reduce the number of independence tests and the computation time without compromising the quality of causal effect estimations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Causal Structure Learning | Synthetic nD=10000, d=2, dmax=10, 100 nodes | CI Test Count5.11 | 13 | |
| Causal Structure Learning | Synthetic nD=10000, d=2, dmax=10, 200 nodes | Number of CI Tests20.02 | 13 | |
| Causal Structure Learning | Synthetic nD=10000, d=2, dmax=10, 400 nodes | CI Test Count79.98 | 13 | |
| Causal Structure Learning | Synthetic nD=10000, d=2, dmax=10, 800 nodes | Number of CI tests3.20e+5 | 11 | |
| Causal Structure Learning | Synthetic nD=10000, d=2, dmax=10, 600 nodes | Number of CI tests179.9 | 11 | |
| Causal Discovery | Binary data 10 nodes, nD=1000, d=2, dmax=10 | Number of CI tests92.52 | 7 | |
| Causal Discovery | Binary data 20 nodes, nD=1000, d=2, dmax=10 | Number of CI Tests242.8 | 7 | |
| Local Causal Discovery | Linear Gaussian 100 nodes | CI Test Count (x10^3)5.01e+3 | 7 | |
| Local Causal Discovery | Linear Gaussian 200 nodes | CI Test Count (x10^3)19.93 | 7 | |
| Local Causal Discovery | Linear Gaussian 400 nodes | Number of CI tests (x10^3)79.81 | 7 |