Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Classification on MultiJail
Loading...
0.9335
F1 Score
CREST-BASE
0.43066
0.561205
0.69175
0.822295
Dec 2, 2025
Dec 27, 2025
Jan 21, 2026
Feb 15, 2026
Mar 12, 2026
Apr 6, 2026
May 1, 2026
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
CREST-BASE
Model Variant=Base, Ba...
2025.12
0.9335
CREST-LARGE
Model Variant=Large, B...
2025.12
0.9329
Nemotron
Parameters=8B
2026.05
0.85
ML-GUARD
Parameters=7B
2026.05
0.85
ML-GUARD
Parameters=1.5B
2026.05
0.83
gpt-oss-safeguard
Parameters=20B
2026.05
0.8
PolyGuard-Qwen
2026.05
0.78
Qwen3Guard-Gen
Parameters=8B
2026.05
0.77
Qwen3Guard-Gen
Parameters=4B
2026.05
0.74
Qwen3Guard-Gen
Parameters=0.6B
2026.05
0.72
Omni-moderation
2026.05
0.66
Llama-Guard-3
Parameters=8B
2026.05
0.63
DuoGuard
Parameters=1.5B
2026.05
0.58
Llama-Guard-4
Parameters=12B
2026.05
0.57
Llama-Guard-3
Parameters=1B
2026.05
0.45
Feedback
Search any
task
Search any
task