Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution

About

Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretation. In this paper, we propose an elegant closed form solution named $OpenBox$ to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply $OpenBox$ to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation.

Lingyang Chu, Xia Hu, Juhua Hu, Lanjun Wang, Jian Pei• 2018

Related benchmarks

Task	Dataset	Result
Classification	Adult	Accuracy85.71	86
Classification	Bank	F1 Score72.4	48
Classification	Wine	F1 Macro76.07	48
Classification	Wine	Accuracy90.52	45
Classification	Activity	Accuracy98.28	34
Classification	banknote	Accuracy100	32
Classification	tic-tac-toe	F1 Score100	30
Classification	chess	F1 Score77.85	30
Classification	Adult	F1 Score73.55	30
Classification	Wine	F1 Score76.07	30

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord