Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

About

Medical Multimodal Large Language Models (Medical MLLMs) have achieved remarkable progress in specialized medical tasks; however, research into their safety has lagged, posing potential risks for real-world deployment. In this paper, we first establish a multidimensional evaluation framework to systematically benchmark the safety of current SOTA Medical MLLMs. Our empirical analysis reveals pervasive vulnerabilities across both general and medical-specific safety dimensions in existing models, particularly highlighting their fragility against cross-modality jailbreak attacks. Furthermore, we find that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, we propose a novel "Parameter-Space Intervention" approach for efficient safety re-alignment. This method extracts intrinsic safety knowledge representations from original base models and concurrently injects them into the target model during the construction of medical capabilities. Additionally, we design a fine-grained parameter search algorithm to achieve an optimal trade-off between safety and medical performance. Experimental results demonstrate that our approach significantly bolsters the safety guardrails of Medical MLLMs without relying on additional domain-specific safety data, while minimizing degradation to core medical performance.

Jiale Zhao, Xing Mou, Jinlin Wu, Hongyuan Yu, Mingrui Sun, Yang Shi, Xuanwu Yin, Zhen Chen, Zhen Lei, Yaohua Wang• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringPubMedQA
Accuracy75.6
145
Medical Visual Question AnsweringVQA-RAD
Accuracy66.3
106
Question AnsweringMedQA USMLE
Accuracy62.18
18
Question AnsweringMedbullets-4
Accuracy60.06
15
Safety EvaluationCATQA Direct
Safety Score (1-ASR)1
8
Safety EvaluationCATQA FigStep
Safety Score100
8
Safety EvaluationHEx-PHI-FigStep
Safety Score (1-ASR)1
8
Medical Question AnsweringMedbullets op5
Accuracy52.27
8
Medical Safety EvaluationMedSafetyBench Direct
Safety Score98
8
Medical Safety EvaluationMedSafetyBench FigStep
Safety Score (1-ASR)0.9978
8
Showing 10 of 23 rows

Other info

Follow for update