Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Malicious behavior measurement on AgentHarm Harmful
Loading...
6
Harm Rate
GPT-5
5
11.75
18.5
25.25
Mar 3, 2026
Harm Rate
Refusal Rate
Non-Refusal Harm Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Harm Rate
Refusal Rate
Non-Refusal Harm Rate
GPT-5
Safety Scaffolding=MOS...
2026.03
6
91
57
Phi-4
Safety Scaffolding=Bas...
2026.03
6
94
68
GPT-4o
Safety Scaffolding=MOS...
2026.03
7
92
67
Qwen3-4B-Think
Safety Scaffolding=MOS...
2026.03
8
89
62
Qwen2.5-7B
Safety Scaffolding=MOS...
2026.03
9
87
52
Qwen3-4B-Think
Safety Scaffolding=Bas...
2026.03
9
86
59
Phi-4
Safety Scaffolding=MOS...
2026.03
9
88
63
GPT-5
Safety Scaffolding=Bas...
2026.03
11
0
11
Qwen2.5-7B
Safety Scaffolding=Bas...
2026.03
18
74
58
GPT-4o
Safety Scaffolding=Bas...
2026.03
31
0
31
Feedback
Search any
task
Search any
task