Share your thoughts, 1 month free Claude Pro on usSee more

Safety Guardrail Classification on 630-scenario real-world benchmark (independent set)

95.4Verdict Accuracy

AgentTrust v0.5

Updated 2mo ago

Evaluation Results

Method	Links
AgentTrust v0.5 2026.05		95.4	2.1	6.4	2.04
AgentTrust v0.5 + LLM-Judge 2026.05		90.5	4.8	0.3	8.6
DeepSeek-V3 2026.05		85.1	3.2	1.7	1,271
NeMo Guardrails 2026.05		55.1	98.4	0	4,315
Trivial regex blocklist 2026.05		37.9	0	85.2	0.03