Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AgentDojo

Benchmarks

Task NameDataset NameSOTA ResultTrend
Utility AssessmentAgentDojo (test)
Utility100
128
Adversarial Attack Success Rate AssessmentAgentDojo
ASR0
88
Utility EvaluationAgentDojo
Utility78.4
32
Indirect Prompt Injection Defense EvaluationAgentDojo TOOLKNOWLEDGE attack suite
Latency (s)6.66
24
Agent Task PerformanceAgentDojo Travel
Attack Success Rate13.57
24
Prompt Injection DefenseAgentDojo New Attack 2
Utility under Attack (UA)89.78
23
Prompt Injection DefenseAgentDojo New Attack 1
Utility under Attack89.88
23
Prompt Injection DefenseAgentDojo Important Instructions
Utility under Attack0.9041
23
Prompt Injection DefenseAgentDojo No Attack
Benign Utility92.78
23
Adversarial Robustness against Indirect Prompt InjectionAgentDojo Average across attacks
UA13.18
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo ToolKnowledge
Utility Score59.64
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo ImportantMsgs
Utility (UA)59.3
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo Combined
UA73.58
22
Adversarial Robustness against Indirect Prompt InjectionAgentDojo IgnorePrevious
Utility (UA)73.92
22
LLM Agent Task CompletionAgentDojo No Attack
Benign Utility73.91
22
Prompt Injection AttackAgentDojo
ASR@164
21
Agent Task PerformanceAgentDojo Banking
Attack Success Rate62.5
18
Agent PlanningAgentDojo
TCR @ ∞80
16
Guarded Agent EvaluationAgentDojo full latest
ASR0.5616
14
Prompt Injection DefenseAgentDojo
Benign Utility77.3
13
Agent defense evaluationAgentDojo
Utility under Attack69.15
12
Indirect Prompt InjectionAgentDojo
Benign Utility64.36
12
LLM Agent DefenseAgentDojo Overall
Clean Utility84.54
12
LLM Agent DefenseAgentDojo Slack
Clean Utility80.95
12
LLM Agent DefenseAgentDojo Workspace
Clean Utility85
12
Showing 25 of 59 rows