Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Agent Security Evaluation on Agent Security Bench (test)

73.67Benign Utility (BU)

Repeat prompt

63.873266.416668.9671.5034Oct 6, 2025
Updated 26d ago

Evaluation Results

MethodLinks
2025.10
73.6767.1269.12
2025.10
73.5860.2559.33
2025.10
72.8368.7568.75
2025.10
70.0870.0871.17
2025.10
64.2563.4216.33