From Evaluation to Defense: Advancing Safety in Video Large Language Models
About
While the safety risks of image-based large language models (Image LLMs) have been extensively studied, their video-based counterparts (Video LLMs) remain critically under-examined. To systematically study this problem, we introduce VideoSafetyEval - a large-scale, real-world benchmark for Video LLM safety, which comprises 11.4k video-query pairs and spans 19 principal risk categories. Based on this, we reveal that integrating video modality degrades safety performance by an average of 34.2%, thereby exposing systemic risks in multimodal attack exploitation. To address this vulnerability, we propose VideoSafety-R1, a dual-stage framework achieving unprecedented safety gains through three innovations: (1) the VideoSafetyThinking dataset contains 46k video-query-thinking response triplets; (2) Alarm Token-Guided Safety Fine-Tuning (AT-SFT) injects learnable alarm tokens into visual and textual sequences, enabling explicit harm perception across modalities via multitask objectives; and (3) safety-guided GRPO enhances defensive reasoning through dynamic policy optimization with rule-based rewards derived from dual-modality verification. These components synergize to shift safety alignment from harm perception to active reasoning. The framework achieves a 71.1% improvement on VSE-HH, and improves by 59.1%, 44.3%, and 15.0% on the image safety datasets MMBench, VLGuard, and FigStep, respectively. Our code and dataset are available at https://github.com/Emiya-syw/VideoSafety-R1.git. Note: This paper contains harmful language and image examples, and reader discretion is recommended.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Understanding | MVBench | Accuracy64.7 | 425 | |
| Video Question Answering | NextQA | Accuracy80.9 | 78 | |
| Safety Evaluation | VLGuard | -- | 24 | |
| Safety Evaluation | VSE-HH | -- | 21 | |
| Video Understanding | VideoMME w/o sub | Score59 | 18 | |
| Safety Evaluation | MMBench | DSR98 | 11 | |
| Safety Evaluation | FigStep | DSR87 | 11 | |
| Safety Evaluation | VSE-SafeQ | FRR0.8 | 11 | |
| Hallucination Detection | Video Hallucer | -- | 10 | |
| Safety Evaluation | VSE HH Base | DSR89.5 | 5 |