Zero-Shot Textual Explanations via Translating Decision-Critical Features

About

Textual explanations make image classifier decisions transparent by describing the prediction rationale in natural language. Large vision-language models can generate captions but are designed for general visual understanding, not classifier-specific reasoning. Existing zero-shot explanation methods align global image features with language, producing descriptions of what is visible rather than what drives the prediction. We propose TEXTER, which overcomes this limitation by isolating decision-critical features before alignment. TEXTER identifies the neurons contributing to the prediction and emphasizes the features encoded in those neurons -- i.e., the decision-critical features. It then maps these emphasized features into the CLIP feature space to retrieve textual explanations that reflect the model's reasoning. A sparse autoencoder further improves interpretability, particularly for Transformer architectures. Extensive experiments show that TEXTER provides more faithful and interpretable explanations than existing methods. The code is available at \url{https://github.com/tttt-0814/TEXTER}.

Toshinori Yamauchi, Hiroshi Kera, Kazuhiko Kawamoto• 2025

Related benchmarks

Task	Dataset	Result
Textual Explanation	ImageNet-1K misclassified cases	Mean Score0.668	40
Faithfulness Evaluation	ImageNet-1K	Insertion Score0.204	40
Textual Explanation Generation	ImageNet-1K	CLIP Score0.3113	30
Misclassification Explanation Evaluation	ImageNet-1K misclassified samples 1.0 (test)	Directional Score (Mean)0.723	20
Textual explanation faithfulness evaluation	ImageNet-1K 1,000 images sampled (test)	Insertion Score0.159	20
Textual Explanation Semantic Consistency	Pascal VOC	CLIP-Score0.234	15

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord