Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy To Game

About

Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales. On standardized data, the same algorithms fail to identify the ground-truth DAG or its Markov equivalence class. While standardization removes the pattern in marginal variance, we show that data generating processes that incur high varsortability also leave a distinct covariance pattern that may be exploited even after standardization. Our findings challenge the significance of generic benchmarks with independently drawn parameters. The code is available at https://github.com/Scriddie/Varsortability.

Alexander G. Reisach, Christof Seiler, Sebastian Weichwald• 2021

Related benchmarks

TaskDatasetResultRank
Causal DiscoverySynthetic DAGs
TPR0.96
125
DAG learningSynthetic (test)
SID427
101
DAG learningSynthetic DAGs (100 nodes, 400 edges) v1
SHD95
51
Causal DiscoverySynthetic DAG data
TPR96
40
Causal DiscoverySynthetic DAG data (test)
TPR96
40
Learning Directed Acyclic GraphsSynthetic DAGs Default: 100 nodes, 400 edges, ER, Gaussian, n=1000 1.0 (test)
SHD800
5
Learning Directed Acyclic GraphsSynthetic DAGs Gumbel noise distribution 1.0 (test)
SHD796
5
Learning Directed Acyclic GraphsSynthetic DAGs Scale-free graph type 1.0 (test)
SHD300
4
Learning Directed Acyclic GraphsSynthetic DAGs High edge density: 1000 edges 1.0 (test)
SHD1.95e+3
4
Learning Directed Acyclic GraphsSynthetic DAGs Dense root causes: p=0.5 1.0 (test)
SHD746
4
Showing 10 of 12 rows

Other info

Code

Follow for update