Share your thoughts, 1 month free Claude Pro on usSee more

LLM Agent Security Evaluation on Agent Security Bench (test)

73.67Benign Utility (BU)

Repeat prompt

Updated 4mo ago

Evaluation Results

Method	Links
Repeat prompt 2025.10		73.67	67.12	69.12
Instr. Prevention 2025.10		73.58	60.25	59.33
None 2025.10		72.83	68.75	68.75
Spotlighting 2025.10		70.08	70.08	71.17
Sanitizer 2025.10		64.25	63.42	16.33