Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

About

Concept Bottleneck Models (CBMs) have recently been proposed to address the 'black-box' problem of deep neural networks, by first mapping images to a human-understandable concept space and then linearly combining concepts for classification. Such models typically require first coming up with a set of concepts relevant to the task and then aligning the representations of a feature extractor to map to these concepts. However, even with powerful foundational feature extractors like CLIP, there are no guarantees that the specified concepts are detectable. In this work, we leverage recent advances in mechanistic interpretability and propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm: instead of pre-selecting concepts based on the downstream classification task, we use sparse autoencoders to first discover concepts learnt by the model, and then name them and train linear probes for classification. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model. We perform a comprehensive evaluation across multiple datasets and CLIP architectures and show that our method yields semantically meaningful concepts, assigns appropriate names to them that make them easy to interpret, and yields performant and interpretable CBMs. Code available at https://github.com/neuroexplicit-saar/discover-then-name.

Sukrut Rao, Sweta Mahajan, Moritz B\"ohle, Bernt Schiele• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	Food-101	Accuracy92.2	570
Image Classification	Flowers102	Accuracy96.6	558
Image Classification	RESISC45	--	472
Image Classification	Food101	Accuracy82.3	457
Image Classification	CUB-200 2011	Accuracy83.3	374
Image Classification	CUB	Accuracy66.19	331
Image Classification	CUB-200-2011 (test)	Top-1 Acc68.38	303
Image Classification	ImageNet (test)	Top-1 Accuracy79.5	299
Image Classification	Oxford Flowers 102	--	234
Image Classification	CIFAR-10 (test)	Accuracy87.6	59

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord