CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
About
Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out of the box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model requires no further training or tuning and takes a step toward automated causal inference (https://github.com/vdblm/CausalPFN/).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Average Treatment Effect (ATE) Estimation | IHDP, ACIC, Lalonde CPS PSID 2016 | ATE Error (IHDP)0.2 | 13 | |
| Conditional Average Treatment Effect (CATE) Estimation | IHDP, ACIC 2016, Lalonde CPS, Lalonde PSID | IHDP Error Metric0.58 | 12 | |
| Uplift Modeling | Lenta 50k stratified (subsample) | Normalized Qini Score1 | 5 | |
| Uplift Modeling | Hillstrom 64k rows (full) | Normalized Qini Score99.2 | 5 | |
| Uplift Modeling | Criteo stratified 50k (subsample) | Normalized Qini Score85.9 | 5 | |
| Uplift Modeling | Hillstrom Hill(2) 64k rows (full) | Normalized Qini Score0.968 | 5 | |
| Uplift Modeling | Megafon Mega 50k stratified (subsample) | Normalized Qini0.97 | 5 | |
| Uplift Modeling | Retail Hero X5 50k stratified (subsample) | Normalized Qini Score0.922 | 5 |