Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Security Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Benign completion reliabilityAgent Security Bench Benign
Completion Reliability99
10
Indirect Prompt Injection robustnessAgent Security Bench IPI
Attack Success Rate (ASR)2
10
Direct Prompt Injection robustnessAgent Security Bench DPI
ASR19
10
LLM Agent Security EvaluationAgent Security Bench (test)
Benign Utility (BU)73.67
5
Showing 4 of 4 rows