How does this interaction affect me? Interpretable attribution for feature interactions

About

Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks.

Michael Tsang, Sirisha Rambhatla, Yan Liu• 2020

Related benchmarks

Task	Dataset	Result
Feature Interaction Attribution	Dyck-2 15,000 corpus size (test)	Average Relative Ranks (ARR)0.25	34
Attribution Faithfulness	ImageNet	--	30
Explanation Sparsity	CUB	Gini Index0.92	24
Attribution Faithfulness	CUB	Faithfulness Correlation0.122	24
Explanation Sparsity	ImageNet	Gini Index0.91	24
Feature Attribution	ImageNet	ROAD_AOPC32	24
Feature Attribution	CUB	ROAD_AOPC0.55	24
Interaction Attribution	Identity Rule language	Average Relative Rank (ARR)0.24	17
Feature Attribution Evaluation	palindrome language	ARR (Static 0)0.356	7

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord