Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

About

Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as jailbreaking but also to inadvertently generating harmful content for benign users. While internal safety alignment via Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) is a primary mitigation strategy, current methods often face a safety-utility trade-off: they either refuse benign queries out of excessive caution or overlook latent risks in cross-modal interactions. To resolve this, we introduce Pragma-VL, an end-to-end alignment algorithm that enables MLLMs to pragmatically arbitrate between safety and helpfulness. First, we enhance visual risk perception with a novel cold-start SFT stage. This is achieved by applying risk-aware clustering to the visual encoder and using an interleaved dataset of risk descriptions and high-quality data. Second, we introduce a theoretically-guaranteed reward model that leverages synergistic learning. We train it with a novel data augmentation method that assigns dynamic weights based on the queries, enabling contextual arbitration between safety and helpfulness. Extensive experiments show that Pragma-VL effectively balances safety and helpfulness, outperforming baselines by 5% to 20% on most multimodal safety benchmarks while preserving its general capabilities in areas such as mathematics and knowledge reasoning.

Ming Wen, Kun Yang, Xin Chen, Jingyu Zhang, Dingding Han, Shiwen Cui, Yuedong Xu• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2--
1362
Visual Question AnsweringTextVQA
Accuracy83.75
1285
Visual Question AnsweringGQA
Accuracy61.42
505
Mathematical ReasoningMathVista
Accuracy67.2
257
Question AnsweringScienceQA
Accuracy89.06
77
Visual Question AnsweringVizWizQA
Accuracy78.9
37
Safety EvaluationBeavertails-V (test)
Helpfulness Score86.93
20
Safety EvaluationSPA-VL (test)
Helpfulness Score97.93
20
Safety EvaluationMSSbench (test)
Effectiveness Score99.66
20
Safety EvaluationMM-SafetyBench (test)
Helpfulness Score68.37
20
Showing 10 of 11 rows

Other info

Follow for update