Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AgentHazard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety DetectionAgentHazard Strongest
Accuracy90.87
56
Agentic Safety ModerationAgentHazard
ASR109
5
Dataset Diversity and Coverage EvaluationAgentHazard full
Goal-text Entropy0.93
1
Dataset Diversity and Coverage EvaluationAgentHazard 3-app
Goal-text Entropy0.976
1
Showing 4 of 4 rows