Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A3S-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent SecurityA3S-BENCH Advanced 1.0
RTR@119.68
11
Agent SecurityA3S-BENCH Basic 1.0
RTR@1 (%)45.27
11
Autonomous agent security evaluationA3S-Bench
Metric-
0
Showing 3 of 3 rows