Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

About

Multimodal large language models (MLLMs) have shown impressive reasoning abilities. However, they are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting the unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed with the introduction of image features. To construct robust MLLMs, we propose ECSO (Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs in MLLMs. Experiments on five state-of-the-art (SoTA) MLLMs demonstrate that ECSO enhances model safety significantly (e.g.,, 37.6% improvement on the MM-SafetyBench (SD+OCR) and 71.3% on VLSafe with LLaVA-1.5-7B), while consistently maintaining utility results on common MLLM benchmarks. Furthermore, we show that ECSO can be used as a data engine to generate supervised-finetuning (SFT) data for MLLM alignment without extra human intervention.

Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang• 2024

Related benchmarks

Task	Dataset	Result
Science Question Answering	ScienceQA	--	791
Multimodal Reasoning	MM-Vet	MM-Vet Score52.4	517
Multimodal Capability Evaluation	MM-Vet	--	393
Mathematical Multimodal Reasoning	MathVista	Accuracy64.6	258
Multimodal Reasoning	MMMU	Accuracy61.4	208
Multimodal Evaluation	MM-Vet	Score35.5	196
Visual Question Answering	GQA	Score63.2	193
Multimodal Reasoning	WeMath	Accuracy38.4	171
Multimodal Reasoning	MathVision	Accuracy42.7	162
Multimodal Reasoning	LogicVista	Accuracy39.4	147

Showing 10 of 92 rows

...

Other info

Follow for update

@wizwand_team Discord