Promises and Pitfalls of Black-Box Concept Learning Models
About
Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.
Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, Weiwei Pan• 2021
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | CelebA | Avg Accuracy30.24 | 137 | |
| Classification | CUB | Accuracy70.7 | 85 | |
| Classification | Trigonometry | Task Accuracy98.67 | 5 | |
| Classification | XOR | Accuracy99.23 | 5 | |
| Concept alignment | CUB | Concept Alignment Score83.19 | 5 | |
| Concept alignment | CelebA | Concept Alignment Score77.48 | 5 | |
| Classification | Dot | Task Accuracy96.67 | 5 | |
| Concept alignment | XOR | Concept Alignment Score98.53 | 5 | |
| Concept alignment | Trigonometry | Concept Alignment Score73.75 | 5 | |
| Concept alignment | Dot | Concept Alignment Score72.66 | 5 |
Showing 10 of 10 rows