Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

About

In the presence of confounding between an endogenous variable and the outcome, instrumental variables (IVs) are used to isolate the causal effect of the endogenous variable. Identifying valid instruments requires interdisciplinary knowledge, creativity, and contextual understanding, making it a non-trivial task. In this paper, we investigate whether large language models (LLMs) can aid in this task. We perform a two-stage evaluation framework. First, we test whether LLMs can recover well-established instruments from the literature, assessing their ability to replicate standard reasoning. Second, we evaluate whether LLMs can identify and avoid instruments that have been empirically or theoretically discredited. Building on these results, we introduce IV Co-Scientist, a multi-agent system that proposes, critiques, and refines IVs for a given treatment-outcome pair. We also introduce a statistical test to contextualize consistency in the absence of ground truth. Our results show the potential of LLMs to discover valid instrumental variables from a large observational database.

Ivaxi Sheth, Zhijing Jin, Bryan Wilder, Dominik Janzing, Mario Fritz• 2026

Related benchmarks

TaskDatasetResultRank
Identifying flawed instrumentsGDP → Conflict
HG Score1
5
Identifying flawed instrumentsBMI → SBP
HG Score1
5
Identifying flawed instrumentsChurch → Crime
HG Score100
5
Identifying flawed instrumentsTurnout → Vote Share
HG Score1
5
Identifying flawed instrumentsProtests → Prices
HG Score1
5
Instrumental Variable DiscoveryGapminder GDP → Health
Relevance14.28
5
Instrumental Variable DiscoveryGapminder Income → Emissions
Relevance17.52
5
Instrumental Variable DiscoveryGapminder Sanitation → Mortality
Relevance11.37
5
Instrumental Variable DiscoveryGapminder Poverty → Cholesterol
Relevance13.44
5
Instrumental Variable DiscoveryGapminder Female literacy → Kids
Relevance19.81
5
Showing 10 of 15 rows

Other info

Follow for update