Promises and Pitfalls of Black-Box Concept Learning Models

About

Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, Weiwei Pan• 2021

Related benchmarks

Task	Dataset	Result
Classification	CelebA	Avg Accuracy30.24	197
Classification	CUB	Accuracy70.7	93
Task Classification	CelebA	Task Accuracy84.8	12
Concept Prediction	CelebA	Concept Accuracy76.8	11
Classification	Trigonometry	Task Accuracy98.67	5
Classification	XOR	Accuracy99.23	5
Concept alignment	CUB	Concept Alignment Score83.19	5
Concept alignment	CelebA	Concept Alignment Score77.48	5
Classification	Dot	Task Accuracy96.67	5
Task Prediction	ColorMNIST+	Task Accuracy99.4	5

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord