Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Task Performance on AgentDojo Overall
Loading...
69.79
Utility
Task Shield
49.2916
54.6133
59.935
65.2567
Dec 21, 2024
Utility
Attack Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Utility
Attack Success Rate
Task Shield
Model=GPT-4o, Attack T...
2024.12
69.79
0.0207
Task Shield
Model=GPT-4o, Attack T...
2024.12
69.48
0.0111
No Defense
Model=GPT-4o, Attack T...
2024.12
68.52
0.0572
Task Shield
Model=GPT-4o, Attack T...
2024.12
66.93
0.0048
No Defense
Model=GPT-4o, Attack T...
2024.12
66.77
0.0541
No Defense
Model=GPT-4o, Attack T...
2024.12
50.08
0.4769
Feedback
Search any
task
Search any
task