Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Post-hoc Concept Bottleneck Models

About

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models. A key benefit of PCBM is that it enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new distributions. PCBM allows for global model edits, which can be more efficient than previous works on local interventions that fix a specific prediction. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using data from the target domain or model retraining.

Mert Yuksekgonul, Maggie Wang, James Zou• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy66.83
3518
Image ClassificationCIFAR-100 (val)
Accuracy51.33
661
Image ClassificationCIFAR100
Accuracy57.2
331
Image ClassificationCUB-200-2011 (test)
Top-1 Acc63.63
276
Image ClassificationImageNet
Accuracy62.57
184
Image ClassificationCIFAR10
Accuracy83.34
125
Image ClassificationCUB-200
Accuracy63.92
92
Image ClassificationCUB--
89
Image ClassificationCUB200 (val)
Accuracy64.65
66
Image ClassificationPlaces365
Top-1 Accuracy39.66
62
Showing 10 of 34 rows

Other info

Follow for update