CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs

About

Multimodal Large Language Models (MLLMs), trained primarily on English-centric data, frequently generate culturally inappropriate or misaligned responses in cross-cultural settings. To mitigate this, we introduce the task of cross-cultural knowledge insertion, which focuses on adapting models to specific cultural contexts while preserving their original behavior in other cultures. To facilitate research in this area, we introduce CrossCult-KIBench, a comprehensive evaluation benchmark for assessing both the effectiveness of knowledge insertion and its unintended side effects on non-target cultures. The benchmark includes 9,800 image-grounded cases covering 49 culturally relevant visual scenarios across English, Chinese, and Arabic language-culture groups. It supports evaluation in both single-insert and sequential-insert settings. We also propose Memory-Conditioned Knowledge Insertion (MCKI) as a baseline method. MCKI retrieves relevant cultural knowledge from an external memory using frozen MLLM representations, prepending matched entries as conditional prompts when applicable. Extensive experiments on CrossCult-KIBench reveal that current approaches struggle to balance effective cultural adaptation with behavioral preservation, highlighting a key challenge in developing culturally-aware MLLMs. Our work thus underscores an important research direction for developing more culturally adaptive and responsible MLLMs.

Zhen Zeng, Leijiang Gu, Feng Li, Jing Yu, Zenglin Shi• 2026

Related benchmarks

Task	Dataset	Result	Rank
Sequential-insert knowledge insertion	CrossCult-KIBench 1.0 (test)	Final Relational ROUGE-L85.57		14
Knowledge Insertion	CrossCult-KIBench single-insert	Reliability (ROUGE-L)79.83		14

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord