Share your thoughts, 1 month free Claude Pro on usSee more

Agent Harm Evaluation on AgentHarm public

9.6HarmScore

gpt-4o + GBT-SE

Updated 4mo ago

Evaluation Results

Method	Links
gpt-4o + GBT-SE 2026.01		9.6
llama-3-8b + GBT-SE 2026.01		10.3
gpt-4o + GBT-Basic 2026.01		11.2
llama-3-8b + GBT-Basic 2026.01		12.4
llama-3-8b + Global guardrail only 2026.01		16.5
gpt-4o + Global guardrail only 2026.01		19
llama-3-8b (native) 2026.01		29.4
gpt-4o (native) 2026.01		71.8