Debiased Counterfactual Generation via Flow Matching from Observations
About
Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close under weak confounding, and share any features of high-dimensional outcomes which are invariant to confounders. These properties motivate learning counterfactual distributions not from scratch, but via a deconfounding flow from the observational distribution. We formulate this problem via flow-matching and derive a semiparametrically efficient estimator based on a novel efficient influence function correction. We subsequently extend our estimator to target minimal-energy flows in high-dimensions, which we show can be especially simple targets between observational and counterfactual distributions. In experiments, deconfounding flows outperform existing debiased counterfactual distribution estimators, while also mitigating known failure modes of flow-based methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Counterfactual Distribution Estimation | 401k SplitSupport outcome (synthetic) | Average W1 Error4 | 6 | |
| Counterfactual Distribution Estimation | TWINS HeavyTails outcome (synthetic) | Average W1 Error0.25 | 6 | |
| Counterfactual Distribution Estimation | ACIC Challenge outcome 16 (test) | Average W1 Error0.35 | 6 | |
| Counterfactual Distribution Estimation | ACIC Gaussian outcome 2019 (synthetic) | Average W1 Error0.05 | 6 |