MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

About

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, yet effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, performs inference over 30x faster than the baseline and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl, Morten Rieger Hannemose• 2026

Related benchmarks

Task	Dataset	Result
Visual Counterfactual Explanation (Age)	CelebA Standard	FID0.77	11
Visual Counterfactual Explanation (Smile)	CelebA Standard	FID0.71	11
Counterfactual Explanation	ImageNet Zebra - Sorrel	FID32.5	11
Counterfactual Explanation	ImageNet (Cheetah - Cougar)	FID37.4	11
Counterfactual Explanation	ImageNet Egyptian Cat - Persian Cat	FID36.2	11
Counterfactual Visual Explanation	BDD100K	FID3.19	10
Visual Counterfactual Explanation (Age)	CelebA-HQ	FID4.43	9
Visual Counterfactual Explanation (Smile)	CelebA-HQ	FID2.51	9
Counterfactual Visual Explanation	BDD-OIA	FID5.43	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord