Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts

About

Recent advancements in neural networks have showcased their remarkable capabilities across various domains. Despite these successes, the "black box" problem still remains. Addressing this, we propose a novel framework, WWW, that offers the 'what', 'where', and 'why' of the neural network decisions in human-understandable terms. Specifically, WWW utilizes adaptive selection for concept discovery, employing adaptive cosine similarity and thresholding techniques to effectively explain 'what'. To address the 'where' and 'why', we proposed a novel combination of neuron activation maps (NAMs) with Shapley values, generating localized concept maps and heatmaps for individual inputs. Furthermore, WWW introduces a method for predicting uncertainty, leveraging heatmap similarities to estimate 'how' reliable the prediction is. Experimental evaluations of WWW demonstrate superior performance in both quantitative and qualitative metrics, outperforming existing methods in interpretability. WWW provides a unified solution for explaining 'what', 'where', and 'why', introducing a method for localized explanations from global interpretations and offering a plug-and-play solution adaptable to various architectures.

Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim• 2024

Related benchmarks

TaskDatasetResultRank
Neuron InterpretationImageNet-1k (val)
CLIP Cosine Similarity0.7792
18
Visual GroundingImageNet-1k (val)
Alignment Score0.66
14
Latent Training Semantics RecoveryImageNet 1k (train)
Semantic Similarity Score43
10
Neuron InterpretationPlaces365 (test)
CLIP Cosine Similarity0.7617
7
Semantic class fidelityImageNet 1k (test)
RN50 Fidelity Score0.6
5
Showing 5 of 5 rows

Other info

Follow for update