Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

About

In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM's mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik• 2025

Related benchmarks

TaskDatasetResultRank
ClassificationCUB--
93
ClassificationCIFAR10
Accuracy88.79
68
Task ClassificationAwA
Task Accuracy100
35
Task ClassificationAWA Inc
Task Accuracy97.58
35
Task ClassificationCUB Inc
Task Accuracy87.52
35
Animal ClassificationAwA
Task Accuracy89.52
8
Animal ClassificationAWA Inc
Task Accuracy86.05
8
Fine-grained Bird ClassificationCUB
Task Accuracy48.65
8
Fine-grained Bird ClassificationCUB Inc
Task Accuracy35.97
8
Image ClassificationCIFAR10
Task Accuracy76.61
8
Showing 10 of 10 rows

Other info

Follow for update