Agent Safety Evaluation

Benchmarks

Dataset Name	SOTA Method	Metric
ToolEmu	SafeMCP	Safety99	36	1mo ago
Agent-SafetyBench aggregated clean and five attack types	SAFEHARNESS	UBR26.31	30	3mo ago
AgentHarm Libra	SafeMCP	Score83	27	1mo ago
AgentHarm Benign Requests	GPT-4o	Safety Score79	27	1mo ago
AgentHarm Harmful Requests	GPT-4o-mini	Score59	27	1mo ago
AgentHarm (held-out)	FATE	HCR12.5	10	2mo ago
AgentDojo held-out	FATE	ASR46.8	10	2mo ago
Agent-SafetyBench	gpt-4o + GBT-SE	Agent-SafetyBench Score72.3	8	2mo ago
AgentDojo	ECA	ASR0	3	29d ago
VPI-Bench		UAR16.99	2	2mo ago
VisualWebArena	ECA	Benign Rate100	2	2mo ago
SafeToolBench		UAR85.7	2	2mo ago
DocVQA	ECA	Benign Rate100	2	2mo ago
AgentDyn	ECA	Benign Rate100	2	2mo ago

Showing 14 of 14 rows