Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Binary Safe/Unsafe Classification on R-Judge (test)
Loading...
57.8
Accuracy
BraveGuard-Qwen3-Guard-8B
39.912
44.556
49.2
53.844
May 31, 2026
Accuracy
Recall
F1 Score
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Recall
F1 Score
BraveGuard-Qwen3-Guard-8B
Backbone=Qwen3-Guard,...
2026.05
57.8
91.2
69.7
NemoGuard
2026.05
54.4
40.6
48.5
Llama3.1-8B-Instruct
Parameters=8B
2026.05
53.7
100
69.5
Qwen3-Guard
2026.05
40.6
5.5
9
Feedback
Search any
task
Search any
task