Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

About

Causal inference is central to scientific discovery, yet choosing appropriate methods remains challenging because of the complexity of both statistical methodology and real-world data. Inspired by the success of artificial intelligence in accelerating scientific discovery, we introduce InferenceEvolve, an evolutionary framework that uses large language models to discover and iteratively refine causal methods. Across widely used benchmarks, InferenceEvolve yields estimators that consistently outperform established baselines: against 58 human submissions in a recent community competition, our best evolved estimator lay on the Pareto frontier across two evaluation metrics. We also developed robust proxy objectives for settings without semi-synthetic outcomes, with competitive results. Analysis of the evolutionary trajectories shows that agents progressively discover sophisticated strategies tailored to unrevealed data-generating mechanisms. These findings suggest that language-model-guided evolution can optimize structured scientific programs such as causal inference, even when outcomes are only partially observed.

Can Wang, Hongyu Zhao, Yiqun Chen• 2026

Related benchmarks

TaskDatasetResultRank
Treatment Effect EstimationIHDP
PEHE Mean1.392
27
Average Treatment Effect EstimationIHDP
Best Achieved Error0.063
2
Average Treatment Effect EstimationACIC 2016
Best Performance0.087
2
Average Treatment Effect EstimationLalonde
Best Error0.033
2
Heterogeneous Treatment Effect EstimationACIC 2016
Best Error0.858
2
Heterogeneous Treatment Effect EstimationLalonde
Best Performance Score0.693
2
Treatment Effect EstimationACIC 2022
Best Performance14.41
2
Showing 7 of 7 rows

Other info

Follow for update