Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AgentDojo

Benchmarks

Task NameDataset NameSOTA ResultTrend
Utility AssessmentAgentDojo (test)
Utility100
128
Adversarial Attack Success Rate AssessmentAgentDojo
ASR0
56
Agent Task PerformanceAgentDojo Travel
Attack Success Rate13.57
24
Prompt Injection DefenseAgentDojo New Attack 2
Utility under Attack (UA)89.78
23
Prompt Injection DefenseAgentDojo New Attack 1
Utility under Attack89.88
23
Prompt Injection DefenseAgentDojo Important Instructions
Utility under Attack0.9041
23
Prompt Injection DefenseAgentDojo No Attack
Benign Utility92.78
23
Agent Task PerformanceAgentDojo Banking
Attack Success Rate62.5
18
Agent PlanningAgentDojo
TCR @ ∞80
16
Guarded Agent EvaluationAgentDojo full latest
ASR0.5616
14
Indirect Prompt InjectionAgentDojo
Benign Utility64.36
12
LLM Agent DefenseAgentDojo Overall
Clean Utility84.54
12
LLM Agent DefenseAgentDojo Slack
Clean Utility80.95
12
LLM Agent DefenseAgentDojo Workspace
Clean Utility85
12
Prompt Injection AttackAgentDojo Slack suite
Baseline ASR14.4
9
Prompt Injection DefenseAgentDojo
Benign Utility77.3
8
Prompt Injection PreventionAgentDojo (test)
Banking Success Rate36
7
Tool-agent system evaluationAgentdojo
Utility (No Attack)27.4
6
Agentic Task ExecutionAgentDojo Total
BU78.5
6
Agentic Task ExecutionAgentDojo Slack
BU86.7
6
Agentic Task ExecutionAgentDojo Banking
BU91.2
6
Agentic Task ExecutionAgentDojo Workspace
BU76.5
6
Agentic Prompt Injection DefenseAgentDojo (test)
ASR (Direct)0
6
Prompt Injection DefenseAgentDojo Overall v1 (test)
CU37.11
6
Prompt Injection DefenseAgentDojo Slack suite v1 (test)
CU61.9
6
Showing 25 of 34 rows