Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AgentDojo

Benchmarks

Task NameDataset NameSOTA ResultTrend
Utility AssessmentAgentDojo (test)
Utility100
128
Adversarial Attack Success Rate AssessmentAgentDojo
ASR0
88
Utility EvaluationAgentDojo
Utility78.4
32
Indirect Prompt Injection Defense EvaluationAgentDojo TOOLKNOWLEDGE attack suite
Latency (s)6.66
24
Agent Task PerformanceAgentDojo Travel
Attack Success Rate13.57
24
Prompt Injection DefenseAgentDojo New Attack 2
Utility under Attack (UA)89.78
23
Prompt Injection DefenseAgentDojo New Attack 1
Utility under Attack89.88
23
Prompt Injection DefenseAgentDojo Important Instructions
Utility under Attack0.9041
23
Prompt Injection DefenseAgentDojo No Attack
Benign Utility92.78
23
Agentic Security and Utility EvaluationAgentDojo
ASR0
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo Average across attacks
UA13.18
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo ToolKnowledge
Utility Score59.64
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo ImportantMsgs
Utility (UA)59.3
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo Combined
UA73.58
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo IgnorePrevious
Utility (UA)73.92
22
LLM Agent Task CompletionAgentDojo No Attack
Benign Utility73.91
22
Prompt Injection AttackAgentDojo
ASR@164
21
Agentic Security EvaluationAgentDojo v1 (97 benign tasks, 27 injection tasks)
Utility Score60.8
20
Agent Task PerformanceAgentDojo Banking
Attack Success Rate62.5
18
Agent PlanningAgentDojo
TCR @ ∞80
16
Agent behavioral safetyAgentDojo
Safety Rate97.1
14
Guarded Agent EvaluationAgentDojo full latest
ASR0.5616
14
Prompt Injection DefenseAgentDojo
Benign Utility77.3
13
Agent Security (Indirect Prompt Injection)AgentDojo Overall (test)
ASR11.6
12
Agent Security (Indirect Prompt Injection)AgentDojo Slack (test)
ASR13.9
12
Showing 25 of 79 rows