IV-ICL: Bounding Causal Effects with Instrumental Variables via In-Context Learning

About

The instrumental-variables (IV) setting is standard for partial identification of causal effects when unobserved confounding makes point identification impossible. Existing approaches face methodological bottlenecks: closed-form bound estimands are required -- e.g., Balke-Pearl equations in binary IV -- and even when available, designing accurate estimators requires manual effort tailored to each estimand. While direct Bayesian inference of the causal effects, instead of the bounds, circumvents these challenges, it is often computationally intensive and suffers from high prior sensitivity or under-dispersed posteriors. As a remedy, we introduce IV-ICL, an amortized Bayesian in-context learning method that learns the marginal posterior distribution of the causal effects directly and derives bounds as its quantiles. Unlike standard variational inference that optimizes exclusive KL divergence, amortized Bayesian inference minimizes the expected inclusive KL, a mass-covering objective. We empirically observe that optimizing inclusive KL can recover the entire identified set across diverse data-generating processes, while exclusive-KL (e.g. with variational inference) of the same Bayesian formulation collapses onto a single mode and fails to cover the identified set. We evaluate IV-ICL on synthetic and semi-synthetic IV benchmarks and show it produces intervals that are more reliably valid and more informative compared to efficient semi-parametric, Bayesian, and plug-in baselines, at 20-500x lower inference time. Beyond methodology, we propose a procedure to convert randomized controlled trials into IV benchmarks with provably preserved ground-truth causal effects that enables a more realistic evaluation of partial-identification methods.

Vahid Balazadeh, Hamidreza Kamkari, Medha Barath, Ricardo Silva, Rahul G. Krishnan• 2026

Related benchmarks

Task	Dataset	Result
Causal effect estimation	STAR math scores Regular+Aide vs. Regular class sizes (Weak instrument ρ ≈ 0.28)	Validity1	6
Causal effect estimation	STAR math scores Regular+Aide vs. Regular class sizes (Strong instrument ρ ≈ 0.89)	Validity1	6
Causal effect estimation	Project STAR Reading scores Weak instrument	Validity1	6
Causal effect estimation	Project STAR Reading scores, Strong instrument	Validity1	6
Instrumental Variable Estimation	Airplane demand modified binary (n=2048 samples)	Validity1	6
Instrumental Variable Estimation	STAR math scores Small vs. Regular class sizes Weak instrument, ρ(Z, T) ≈ 0.29	Validity100	6
Instrumental Variable Estimation	STAR Strong instrument math scores Small vs. Regular class sizes	Validity Score1	6
Partial identification of causal effects	Synthetic Binary-outcome ground-truth bounds known	Validity100	6
Partial identification of causal effects	Jobs semi-synthetic RCT-derived labels	Validity100	6
Partial identification under instrumental variables	STAR small vs. regular class size reading scores Weak instrument ρ(Z, T) ≈ 0.29	Validity1	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord