Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Detection on TS-Bench AgentHarm-Traj (eval)
Loading...
1.36
Latency (s/sample)
TS-Guard
1.0144
3.3472
5.68
8.0128
Jan 15, 2026
Latency (s/sample)
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Latency (s/sample)
F1 Score
TS-Guard
2026.01
1.36
90.16
GPT-4o
2026.01
1.98
84.8
AGrail
Model=GPT-4o
2026.01
8.75
85.75
ShieldAgent
2026.01
10
-
Feedback
Search any
task
Search any
task