Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Full-response Safety Guardrail Classification on Safe-RLHF (test)
Loading...
93.2
F1 Score
Qwen3Guard-8B
86.024
87.887
89.75
91.613
Jun 1, 2026
F1 Score
Updated 1d ago
Evaluation Results
Method
Method
Links
F1 Score
Qwen3Guard-8B
2026.06
93.2
SentGuard
backbone=Qwen3-4B-Inst...
2026.06
92.5
WildGuard-7B
2026.06
92.3
GPT-5.5
protocol=zero-shot
2026.06
90.7
LlamaGuard3-8B
2026.06
88.7
Gemini-3.5-Flash
protocol=zero-shot
2026.06
86.9
LlamaGuard4-12B
2026.06
86.3
Feedback
Search any
task
Search any
task