Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Provably Robust Bayesian Counterfactual Explanations under Model Changes

About

Counterfactual explanations (CEs) offer interpretable insights into machine learning predictions by answering ``what if?" questions. However, in real-world settings where models are frequently updated, existing counterfactual explanations can quickly become invalid or unreliable. In this paper, we introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are $\delta$-safe, to ensure high predictive confidence, and $\epsilon$-robust to ensure low predictive variance. Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes which are adhered to in what we refer to as the $\langle \delta, \epsilon \rangle$-set. Uncertainty-aware constraints are integrated into our optimization framework and we validate our method empirically across diverse datasets. We compare our approach against state-of-the-art Bayesian CE methods, where PSCE produces counterfactual explanations that are not only more plausible and discriminative, but also provably robust under model change.

Jamie Duell, Xiuyi Fan• 2026

Related benchmarks

TaskDatasetResultRank
Counterfactual ExplanationsBreast Cancer (test)
IM11.554
16
Counterfactual ExplanationsCredit (test)
IM10.7383
16
Counterfactual ExplanationsSpam (test)
IM10.8967
16
Counterfactual ExplanationsPneumoniaMNIST (test)
IM10.63
16
Counterfactual ExplanationsMNIST (test)
IM1 Score0.9768
11
Counterfactual ExplanationMNIST (test)
Validity Score98.2
10
Showing 6 of 6 rows

Other info

Follow for update