Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach

About

This paper addresses the challenge of generating Counterfactual Explanations (CEs), involving the identification and modification of the fewest necessary features to alter a classifier's prediction for a given image. Our proposed method, Text-to-Image Models for Counterfactual Explanations (TIME), is a black-box counterfactual technique based on distillation. Unlike previous methods, this approach requires solely the image and its prediction, omitting the need for the classifier's structure, parameters, or gradients. Before generating the counterfactuals, TIME introduces two distinct biases into Stable Diffusion in the form of textual embeddings: the context bias, associated with the image's structure, and the class bias, linked to class-specific features learned by the target classifier. After learning these biases, we find the optimal latent code applying the classifier's predicted class token and regenerate the image using the target embedding as conditioning, producing the counterfactual explanation. Extensive empirical studies validate that TIME can generate explanations of comparable effectiveness even when operating within a black-box setting.

Guillaume Jeanneret, Lo\"ic Simon, Fr\'ed\'eric Jurie• 2023

Related benchmarks

TaskDatasetResultRank
Image ReconstructionFFHQ No glasses
LPIPS0.025
18
Image ReconstructionFFHQ Glasses
LPIPS0.026
18
Image Editing (Add glasses)FFHQ (test)
ID-Sim0.686
15
Image Editing (Remove glasses)FFHQ (test)
ID-Sim0.692
15
Attribute ClassificationFFHQ (test)
Accuracy83.4
15
Counterfactual Visual ExplanationBDD100K
FID51.5
10
Visual Counterfactual Explanation (Smile)CelebA-HQ
FID10.98
9
Visual Counterfactual Explanation (Age)CelebA-HQ
FID20.9
9
Showing 8 of 8 rows

Other info

Follow for update