Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

About

Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate reasoning ability. To support the training of OMNIGUARD, we curate a large, comprehensive omni-modal safety dataset comprising over 210K diverse samples, with inputs that cover all modalities through both unimodal and cross-modal samples. Each sample is annotated with structured safety labels and carefully curated safety critiques from expert models through targeted distillation. Extensive experiments on 15 benchmarks show that OmniGuard achieves strong effectiveness and generalization across a wide range of multimodal safety scenarios. Importantly, OmniGuard provides a unified framework that enforces policies and mitigates risks in omni-modalities, paving the way toward building more robust and capable omnimodal safeguarding systems.

Boyu Zhu, Xiaofei Wen, Wenjie Jacky Mo, Tinghui Zhu, Yanan Xie, Peng Qi, Muhao Chen• 2025

Related benchmarks

TaskDatasetResultRank
Text-based safety moderationOpenAI
F1 Score81.1
26
Safety EvaluationUnsafeBench
F1 Score72.3
24
Text-based safety moderationToxic Chat
F1 Score67
24
Text-based safety moderationBeavertails
F1 Score83.9
19
Unsafe content detectionLlavaGuard
Accuracy78.2
14
Text-based safety moderationAegis
F1 Score84
12
Text-based safety moderationWildGuard
F1 Score78.6
12
Unsafe content detectionVLGuard
F1 Score79.3
9
Video Safety ModerationSafeWatch
F1 Score92.3
9
Video Safety ModerationSafeSora
F1 Score71.8
9
Showing 10 of 12 rows

Other info

Follow for update