InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

About

Causal inference is central to scientific discovery, yet choosing appropriate methods remains challenging because of the complexity of both statistical methodology and real-world data. Inspired by the success of artificial intelligence in accelerating scientific discovery, we introduce InferenceEvolve, an evolutionary framework that uses large language models to discover and iteratively refine causal methods. Across widely used benchmarks, InferenceEvolve yields estimators that consistently outperform established baselines: against 58 human submissions in a recent community competition, our best evolved estimator lay on the Pareto frontier across two evaluation metrics. We also developed robust proxy objectives for settings without semi-synthetic outcomes, with competitive results. Analysis of the evolutionary trajectories shows that agents progressively discover sophisticated strategies tailored to unrevealed data-generating mechanisms. These findings suggest that language-model-guided evolution can optimize structured scientific programs such as causal inference, even when outcomes are only partially observed.

Can Wang, Hongyu Zhao, Yiqun Chen• 2026

Related benchmarks

Task	Dataset	Result
Treatment Effect Estimation	IHDP	PEHE Mean1.392	27
Average Treatment Effect Estimation	IHDP	--	24
Average Treatment Effect Estimation	ACIC 2016	Best Performance0.087	2
Average Treatment Effect Estimation	Lalonde	Best Error0.033	2
Heterogeneous Treatment Effect Estimation	ACIC 2016	Best Error0.858	2
Heterogeneous Treatment Effect Estimation	Lalonde	Best Performance Score0.693	2
Treatment Effect Estimation	ACIC 2022	Best Performance14.41	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord