Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WASP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Prompt injection defenseWASP
Attack Success Rate (ASR)0
16
Agent Planning Security and AutonomyWASP Reddit (test)
Attack Success Rate0
8
Agent Planning Security and AutonomyWASP GitLab (test)
Attack Success Rate29.2
8
Computer-Using Agent TaskWASP 1.0 (test)
PCR97.6
5
Label PredictionWASP
Accuracy90.6
4
Showing 5 of 5 rows