From Evaluation to Defense: Advancing Safety in Video Large Language Models

About

While the safety risks of image-based large language models (Image LLMs) have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce VideoSafetyEval - a large-scale, real-world benchmark for Video LLM safety, which comprises 11.4k video-query pairs and spans 19 principal risk categories. Based on this, we reveal that integrating video modality degrades safety performance by an average of 34.2%, thereby exposing systemic risks in multimodal attack exploitation. To address this vulnerability, we propose VideoSafety-R1, a dual-stage framework achieving unprecedented safety gains through three innovations: (1) the VideoSafetyThinking dataset contains 46k video-query-thinking response triplets; (2) Alarm Token-Guided Safety Fine-Tuning (AT-SFT) injects learnable alarm tokens into visual and textual sequences, enabling explicit harm perception across modalities via multitask objectives; and (3) safety-guided GRPO enhances defensive reasoning through dynamic policy optimization with rule-based rewards derived from dual-modality verification. These components synergize to shift safety alignment from harm perception to active reasoning. The framework achieves a 71.1% improvement on VSE-HH, and improves by 59.1%, 44.3%, and 15.0% on the image safety datasets MMBench, VLGuard, and FigStep, respectively. Our code and dataset are available at https://github.com/Emiya-syw/VideoSafety-R1.git. Note: This paper contains harmful language and image examples, and reader discretion is recommended.

Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie• 2025

Related benchmarks

Task	Dataset	Result
Video Understanding	MVBench	Accuracy64.7	563
Video Question Answering	NextQA	Accuracy80.9	78
Video Understanding	VideoMME w/o sub	Score59	29
Safety Evaluation	VLGuard	--	27
Safety Evaluation	VSE-HH	--	21
Safety Evaluation	MMBench	DSR98	11
Safety Evaluation	FigStep	DSR87	11
Safety Evaluation	VSE-SafeQ	FRR0.8	11
Hallucination Detection	Video Hallucer	--	10
Safety Evaluation	VSE HH Base	DSR89.5	5

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord