Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

About

Concept Bottleneck Models (CBMs) try to make the decision-making process transparent by exploring an intermediate concept space between the input image and the output prediction. Existing CBMs just learn coarse-grained relations between the whole image and the concepts, less considering local image information, leading to two main drawbacks: i) they often produce spurious visual-concept relations, hence decreasing model reliability; and ii) though CBMs could explain the importance of every concept to the final prediction, it is still challenging to tell which visual region produces the prediction. To solve these problems, this paper proposes a Disentangled Optimal Transport CBM (DOT-CBM) framework to explore fine-grained visual-concept relations between local image patches and concepts. Specifically, we model the concept prediction process as a transportation problem between the patches and concepts, thereby achieving explicit fine-grained feature alignment. We also incorporate orthogonal projection losses within the modality to enhance local feature disentanglement. To further address the shortcut issues caused by statistical biases in the data, we utilize the visual saliency map and concept label statistics as transportation priors. Thus, DOT-CBM can visualize inversion heatmaps, provide more reliable concept predictions, and produce more accurate class predictions. Comprehensive experiments demonstrate that our proposed DOT-CBM achieves SOTA performance on several tasks, including image classification, local part detection and out-of-distribution generalization.

Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, Hongwei Liu• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR100	Accuracy85.83	378
Image Classification	CUB	Accuracy85.39	351
Fine-grained Image Classification	CUB-200 2011	Accuracy85.39	317
Image Classification	CUB-200	Accuracy83.76	126
Classification	CIFAR-10	Accuracy97.75	108
Classification	CUB	Accuracy85.39	100
Classification	CIFAR100	Accuracy85.83	90
Image Classification	Places365	--	79
Classification	CIFAR-100	Top-1 Accuracy84.75	61
Image Classification	ImageNet	Accuracy83.84	47

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord