On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study

About

Counterfactual reasoning has emerged as a crucial technique for generalizing the reasoning capabilities of large language models (LLMs). By generating and analyzing counterfactual scenarios, researchers can assess the adaptability and reliability of model decision-making. Although prior work has shown that LLMs often struggle with counterfactual reasoning, it remains unclear which factors most significantly impede their performance across different tasks and modalities. In this paper, we propose a decompositional strategy that breaks down the counterfactual generation from causality construction to the reasoning over counterfactual interventions. To support decompositional analysis, we investigate \ntask datasets spanning diverse tasks, including natural language understanding, mathematics, programming, and vision-language tasks. Through extensive evaluations, we characterize LLM behavior across each decompositional stage and identify how modality type and intermediate reasoning influence performance. By establishing a structured framework for analyzing counterfactual reasoning, this work contributes to the development of more reliable LLM-based reasoning systems and informs future elicitation strategies.

Shuai Yang, Qi Yang, Luoxi Tang, Yuqiao Meng, Nancy Guo, Jeremy Blackburn, Zhaohan Xi• 2025

Related benchmarks

Task	Dataset	Result
Causal Variable Identification	CRASS	--	7
Causal Variable Identification	CLOMO	--	7
Causal Variable Identification	RNN-Topo	--	7
Causal Variable Identification	CVQA-Bool	--	7
Causal Variable Identification	CVQA Count	--	7
Causal Variable Identification	COCO	--	7
Causal Variable Identification	Arithmetic	--	7
Causal Variable Identification	MalAlgoQA	--	7
Causal Variable Identification	HumanEval Exe	--	7
Causal Variable Identification	Open-Critic	--	7

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord