Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Agent Security Evaluation on Agent Security Bench (test)
Loading...
73.67
Benign Utility (BU)
Repeat prompt
63.8732
66.4166
68.96
71.5034
Oct 6, 2025
Benign Utility (BU)
Utility under Attack (UA)
Attack Success Rate (ASR)
Updated 26d ago
Evaluation Results
Method
Method
Links
Benign Utility (BU)
Utility under Attack (UA)
Attack Success Rate (ASR)
Repeat prompt
Backbone=GPT-4o
2025.10
73.67
67.12
69.12
Instr. Prevention
Backbone=GPT-4o
2025.10
73.58
60.25
59.33
None
Backbone=GPT-4o
2025.10
72.83
68.75
68.75
Spotlighting
Backbone=GPT-4o
2025.10
70.08
70.08
71.17
Sanitizer
Backbone=GPT-4o
2025.10
64.25
63.42
16.33
Feedback
Search any
task
Search any
task