SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

About

Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled. This entanglement limits their ability to remove bias without degrading semantic fidelity. In this work, we propose Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones. This enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in retrieval and zero-shot classification. Our results demonstrate that sparse latent representations provide an effective foundation for post-hoc debiasing of vision-language models.

Quentin Guimard, Federico Bartsch, Simone Caldarella, Rahaf Aljundi, Elisa Ricci, Massimiliano Mancini• 2026

Related benchmarks

Task	Dataset	Result
Social Bias Evaluation	FairFace	MS0.079	54
Bias Mitigation for Stereotype Queries	UTKFACE Race	KL Divergence0.035	33
Bias Mitigation for Stereotype Queries	UTKFACE Gender	KL Divergence0.009	33
Image Retrieval	CelebA Stereotype queries	KL Divergence0.03	24
Zero-shot classification fairness	CelebA Gender	Accuracy85.1	24
Zero-shot classification fairness	Waterbirds Background	Accuracy (Zero-shot)88.1	24
Image Retrieval	CelebA Hair Color queries	KL Divergence0.029	24
Classification	CelebA Gender (test)	Accuracy85.6	24
Classification	Waterbirds Background (test)	Accuracy85.5	24
Debiasing	100 Profession Prompts	Content Preservation0.878	2

Showing 10 of 10 rows

Other info

GitHub

Follow for update

@wizwand_team Discord