Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Harm Evaluation on AgentHarm public
Loading...
9.6
HarmScore
gpt-4o + GBT-SE
7.112
23.906
40.7
57.494
Jan 30, 2026
HarmScore
Updated 1mo ago
Evaluation Results
Method
Method
Links
HarmScore
gpt-4o + GBT-SE
Base Model=gpt-4o, Saf...
2026.01
9.6
llama-3-8b + GBT-SE
Base Model=llama-3-8b,...
2026.01
10.3
gpt-4o + GBT-Basic
Base Model=gpt-4o, Saf...
2026.01
11.2
llama-3-8b + GBT-Basic
Base Model=llama-3-8b,...
2026.01
12.4
llama-3-8b + Global guardrail only
Base Model=llama-3-8b,...
2026.01
16.5
gpt-4o + Global guardrail only
Base Model=gpt-4o, Saf...
2026.01
19
llama-3-8b (native)
Base Model=llama-3-8b,...
2026.01
29.4
gpt-4o (native)
Base Model=gpt-4o, Saf...
2026.01
71.8
Feedback
Search any
task
Search any
task