Finding Culture-Sensitive Neurons in Vision-Language Models

About

Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the presence of culture-sensitive neurons, i.e., neurons whose activations show preferential sensitivity to inputs associated with particular cultural contexts. We examine whether such neurons are important for culturally diverse visual question answering and where they are located. Using the CVQA benchmark, we identify neurons of culture selectivity and perform diagnostic tests by deactivating the neurons flagged by various identification methods. Experiments on three VLMs across 25 cultural groups demonstrate the existence of neurons whose ablation disproportionately harms performance on questions about the corresponding cultures, while having limited effects on others. Moreover, we introduce a new margin-based selector Contrastive Activation Margin (ConAct) and show that it outperforms probability- and entropy-based methods in identifying neurons associated with cultural selectivity. Finally, our layer-wise analyses reveal that such neurons are not uniformly distributed: they cluster in specific decoder layers in a model-dependent way.

Xiutian Zhao, Rochelle Choenni, Rohit Saxena, Ivan Titov• 2025

Related benchmarks

Task	Dataset	Result	Rank
Cultural Visual Question Answering	CVQA (eval)	Accuracy Delta5.52		39

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord