Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Safety Evaluation on Agent-SafetyBench

72.3Agent-SafetyBench Score

gpt-4o + GBT-SE

18.01232.10646.260.294Jan 30, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.01
72.3--
2026.01
70.8--
2026.01
60.2--
2026.01
56.8--
2026.01
55.4--
2026.01
50.4--
2026.01
44.2--
2026.01
20.1--
2026.05
-0100
2026.05
-7242.1
2026.05
-78.328.5
2026.05
-74.739.3