GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, Video, and Audio

About

We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, video, and audio data. First, we construct a comprehensive training corpus comprising 181k samples spanning these four modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL with a concise correctness reward to preserve accurate reasoning while suppressing redundant generation. We release a suite of models scaled at 3B and 7B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks.

Zhenhao Zhu, Yue Liu, Yanpei Guo, Wenjie Qu, Cancan Chen, Yufei He, Yibo Li, Yulin Chen, Tianyi Wu, Huiying Xu, Xinzhong Zhu, Jiaheng Zhang• 2026

Related benchmarks

Task	Dataset	Result
Response Harmfulness Detection	HarmBench	F1 Score87.61	100
Response Harmfulness Detection	XSTEST-RESP	Response Harmfulness F195.48	76
Response Harmfulness Detection	Beavertails	F1 Score86.04	59
Safety Classification	SafeRLHF	F1 Score0.6844	48
Response Harmfulness Classification	WildGuard (test)	F1 (Total)77.57	30
Prompt Harmfulness Detection	Text & Image Benchmarks Average	F1 Score83.84	19
Prompt Harmfulness Detection	UCF-Crime	F1 Score91.67	7
Prompt Harmfulness Detection	XD-Violence	F1 Score96.82	7
Prompt Harmfulness Detection	FVC	F1 Score67.86	7
Prompt Harmfulness Detection	HarmVideo	F1 Score95.5	7

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord