Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Guardrail Classification on 630-scenario real-world benchmark (independent set)
Loading...
95.4
Verdict Accuracy
AgentTrust v0.5
35.6
51.125
66.65
82.175
May 6, 2026
Verdict Accuracy
False Positive Rate (FPR)
False Negative Rate (FNR)
Median Latency (ms)
Updated 27d ago
Evaluation Results
Method
Method
Links
Verdict Accuracy
False Positive Rate (FPR)
False Negative Rate (FNR)
Median Latency (ms)
AgentTrust v0.5
System Configuration=r...
2026.05
95.4
2.1
6.4
2.04
AgentTrust v0.5 + LLM-Judge
System Configuration=h...
2026.05
90.5
4.8
0.3
8.6
DeepSeek-V3
Evaluation Protocol=ze...
2026.05
85.1
3.2
1.7
1,271
NeMo Guardrails
Model Backend=DeepSeek-V3
2026.05
55.1
98.4
0
4,315
Trivial regex blocklist
Patterns=50 patterns
2026.05
37.9
0
85.2
0.03
Feedback
Search any
task
Search any
task