Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Safety Evaluation on Agent-SafetyBench

72.3Agent-SafetyBench Score

gpt-4o + GBT-SE

18.01232.10646.260.294Jan 30, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
72.3
2026.01
70.8
2026.01
60.2
2026.01
56.8
2026.01
55.4
2026.01
50.4
2026.01
44.2
2026.01
20.1