Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WASP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent Planning Security and AutonomyWASP Reddit (test)
Attack Success Rate0
8
Agent Planning Security and AutonomyWASP GitLab (test)
Attack Success Rate29.2
8
Computer-Using Agent TaskWASP 1.0 (test)
PCR97.6
5
Label PredictionWASP
Accuracy90.6
4
Showing 4 of 4 rows