Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent-SafetyBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agentic OversightAgent-SafetyBench
Detection Accuracy84.06
42
Showing 1 of 1 rows