Zero-shot Concept Bottleneck Models
About
Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final label prediction by the intermediate prediction of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-label mappings, incurring target dataset collections and training resources. In this paper, we present zero-shot concept bottleneck models (Z-CBMs), which predict concepts and labels in a fully zero-shot manner without training neural networks. Z-CBMs utilize a large-scale concept bank, which is composed of millions of vocabulary extracted from the web, to describe arbitrary input in various domains. For the input-to-concept mapping, we introduce concept retrieval, which dynamically finds input-related concepts by the cross-modal search on the concept bank. In the concept-to-label inference, we apply concept regression to select essential concepts from the retrieved concepts by sparse linear regression. Through extensive experiments, we confirm that our Z-CBMs provide interpretable and intervenable concepts without any additional training. Code will be available at https://github.com/yshinya6/zcbm.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | 12 Image Classification Datasets | Top-1 Accuracy78.31 | 12 | |
| Image Classification | Places365 | Accuracy (Seen)34.9 | 4 | |
| Image Classification | CIFAR-100 | Accuracy (Seen Classes)31.9 | 4 | |
| Image Classification | ImageNet-100 | Seen Accuracy59.2 | 4 | |
| Image Classification | ImageNet-1K | Seen Score43.9 | 4 | |
| Inference | ImageNet-100 | Embedding Latency (ms/img)97.55 | 4 |