CausalAbstain: Enhancing Multilingual LLMs with Causal Reasoning for Trustworthy Abstention

About

Large Language Models (LLMs) often exhibit knowledge disparities across languages. Encouraging LLMs to \textit{abstain} when faced with knowledge gaps is a promising strategy to reduce hallucinations in multilingual settings. Current abstention strategies for multilingual scenarios primarily rely on generating feedback in various languages using LLMs and performing self-reflection. However, these methods can be adversely impacted by inaccuracies and biases in the generated feedback. To address this, from a causal perspective, we introduce \textit{CausalAbstain}, a method that helps LLMs determine whether to utilize multiple generated feedback responses and how to identify the most useful ones. Extensive experiments demonstrate that \textit{CausalAbstain} effectively selects helpful feedback and enhances abstention decisions with interpretability in both native language (\textsc{Casual-native}) and multilingual (\textsc{Causal-multi}) settings, outperforming strong baselines on two benchmark datasets covering encyclopedic and commonsense knowledge QA tasks. Our code and data are open-sourced at https://github.com/peachch/CausalAbstain.

Yuxi Sun, Aoqi Zuo, Wei Gao, Jing Ma• 2025

Related benchmarks

Task	Dataset	Result
Multilingual Language Understanding	M-MMLU (test)	Overall Accuracy56.5	38
Multilingual Knowledge Evaluation	m-MMLU	Overall Accuracy73.8	31
Multilingual Commonsense Reasoning	M-Hellaswag	Accuracy (zh)79.2	21

Showing 3 of 3 rows

Other info

Code

Follow for update

@wizwand_team Discord