Avoiding Leakage Poisoning: Concept Interventions Under Distribution Shifts

About

In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level concepts (e.g., stripes, black) and then predict a task label from those concepts. In particular, we study the impact of concept interventions (i.e., operations where a human expert corrects a CM's mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term leakage poisoning, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce MixCEM, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.

Mateo Espinosa Zarlenga, Gabriele Dominici, Pietro Barbiero, Zohreh Shams, Mateja Jamnik• 2025

Related benchmarks

Task	Dataset	Result
Classification	CUB	--	100
Classification	CIFAR10	Accuracy88.79	83
Task Classification	AwA	Task Accuracy100	35
Task Classification	AWA Inc	Task Accuracy97.58	35
Task Classification	CUB Inc	Task Accuracy87.52	35
Animal Classification	AwA	Task Accuracy89.52	8
Animal Classification	AWA Inc	Task Accuracy86.05	8
Fine-grained Bird Classification	CUB	Task Accuracy48.65	8
Fine-grained Bird Classification	CUB Inc	Task Accuracy35.97	8
Image Classification	CIFAR10	Task Accuracy76.61	8

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord