Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Evaluation to Defense: Advancing Safety in Video Large Language Models

About

While the safety risks of image-based large language models (Image LLMs) have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce VideoSafetyEval - a large-scale, real-world benchmark for Video LLM safety, which comprises 11.4k video-query pairs and spans 19 principal risk categories. Based on this, we reveal that integrating video modality degrades safety performance by an average of 34.2%, thereby exposing systemic risks in multimodal attack exploitation. To address this vulnerability, we propose VideoSafety-R1, a dual-stage framework achieving unprecedented safety gains through three innovations: (1) the VideoSafetyThinking dataset contains 46k video-query-thinking response triplets; (2) Alarm Token-Guided Safety Fine-Tuning (AT-SFT) injects learnable alarm tokens into visual and textual sequences, enabling explicit harm perception across modalities via multitask objectives; and (3) safety-guided GRPO enhances defensive reasoning through dynamic policy optimization with rule-based rewards derived from dual-modality verification. These components synergize to shift safety alignment from harm perception to active reasoning. The framework achieves a 71.1% improvement on VSE-HH, and improves by 59.1%, 44.3%, and 15.0% on the image safety datasets MMBench, VLGuard, and FigStep, respectively. Our code and dataset are available at https://github.com/Emiya-syw/VideoSafety-R1.git. Note: This paper contains harmful language and image examples, and reader discretion is recommended.

Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie• 2025

Related benchmarks

TaskDatasetResultRank
Video UnderstandingMVBench
Accuracy64.7
425
Video Question AnsweringNextQA
Accuracy80.9
78
Safety EvaluationVLGuard--
24
Safety EvaluationVSE-HH--
21
Video UnderstandingVideoMME w/o sub
Score59
18
Safety EvaluationMMBench
DSR98
11
Safety EvaluationFigStep
DSR87
11
Safety EvaluationVSE-SafeQ
FRR0.8
11
Hallucination DetectionVideo Hallucer--
10
Safety EvaluationVSE HH Base
DSR89.5
5
Showing 10 of 12 rows

Other info

Follow for update