Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio

About

We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, video, and audio data. First, we construct a comprehensive training corpus comprising 181k samples spanning these four modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL with a concise correctness reward to preserve accurate reasoning while suppressing redundant generation. We release a suite of models scaled at 3B and 7B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks.

Zhenhao Zhu, Yue Liu, Yanpei Guo, Wenjie Qu, Cancan Chen, Yufei He, Yibo Li, Yulin Chen, Tianyi Wu, Huiying Xu, Xinzhong Zhu, Jiaheng Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Response Harmfulness DetectionHarmBench
F1 Score87.61
100
Response Harmfulness DetectionXSTEST-RESP
Response Harmfulness F195.48
76
Response Harmfulness DetectionBeavertails
F1 Score86.04
59
Safety ClassificationSafeRLHF
F1 Score0.6844
48
Response Harmfulness ClassificationWildGuard (test)
F1 (Total)77.57
30
Prompt Harmfulness DetectionText & Image Benchmarks Average
F1 Score83.84
19
Prompt Harmfulness DetectionUCF-Crime
F1 Score91.67
7
Prompt Harmfulness DetectionXD-Violence
F1 Score96.82
7
Prompt Harmfulness DetectionFVC
F1 Score67.86
7
Prompt Harmfulness DetectionHarmVideo
F1 Score95.5
7
Showing 10 of 14 rows

Other info

Follow for update