WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts

About

Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the "black box" problem still remains. Addressing this, we propose a novel framework, WWW, that offers the 'what', 'where', and 'why' of the neural network decisions in human-understandable terms. Specifically, WWW utilizes adaptive selection for concept discovery, employing adaptive cosine similarity and thresholding techniques to effectively explain 'what'. To address the 'where' and 'why', we proposed a novel combination of neuron activation maps (NAMs) with Shapley values, generating localized concept maps and heatmaps for individual inputs. Furthermore, WWW introduces a method for predicting uncertainty, leveraging heatmap similarities to estimate 'how' reliable the prediction is. Experimental evaluations of WWW demonstrate superior performance in both quantitative and qualitative metrics, outperforming existing methods in interpretability. WWW provides a unified solution for explaining 'what', 'where', and 'why', introducing a method for localized explanations from global interpretations and offering a plug-and-play solution adaptable to various architectures.

Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim• 2024

Related benchmarks

Task	Dataset	Result
Neuron Interpretation	ImageNet-1k (val)	CLIP Cosine Similarity0.7792	18
Visual Grounding	ImageNet-1k (val)	Alignment Score0.66	14
Latent Training Semantics Recovery	ImageNet 1k (train)	Semantic Similarity Score43	10
Neuron Interpretation	Places365 (test)	CLIP Cosine Similarity0.7617	7
Semantic class fidelity	ImageNet 1k (test)	RN50 Fidelity Score0.6	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord