Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors

About

Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form. This deficiency remains a key challenge when applying CNNs in important domains. Recent work on explanations through feature importance of approximate linear models has moved from input-level features (pixels or segments) to features from mid-layer feature maps in the form of concept activation vectors (CAVs). CAVs contain concept-level information and could be learned via clustering. In this work, we rethink the ACE algorithm of Ghorbani et~al., proposing an alternative invertible concept-based explanation (ICE) framework to overcome its shortcomings. Based on the requirements of fidelity (approximate models to target models) and interpretability (being meaningful to people), we design measurements and evaluate a range of matrix factorization methods with our framework. We find that non-negative concept activation vectors (NCAVs) from non-negative matrix factorization provide superior performance in interpretability and fidelity based on computational and human subject experiments. Our framework provides both local and global concept-level explanations for pre-trained CNN models.

Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A. Ehinger, Benjamin I. P. Rubinstein• 2020

Related benchmarks

Task	Dataset	Result
Concept interpretability	ImageNet	Precision62	12
Monosemanticity Evaluation	ImageNet	M Metric7.44	12
Influence Analysis	ImageNet	I187	12
Network Dissection	Broden	Concept Detectors (Color)0.00e+0	12
Interpretability Evaluation	ImageNet Inception-v3	Coverage70	12
Interpretable Direction Discovery	Places365	Coverage57	12
Latent Direction Analysis	Moments in Time (MiT)	Coverage54	12
Semantic segmentation	ImageNet	S1 Score24.82	12
Clustering Quality	ImageNet	Coverage55	12
Object Classification	Caltech-101 (test)	SURFMAE3.33	7

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord