Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AgentHarm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Illicit task completionAgentHarm English prompts
AgentHarm Score (AHS)72.7
20
Step-level tool invocation safety detectionAgentHarm Traj
Accuracy84.81
20
Guarded Agent EvaluationAgentHarm latest (full)
Refusal Rate97.16
14
Showing 3 of 3 rows