Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Benign tool-calling reliability on AgentHarm Benign
Loading...
0
Refusal Rate
GPT-4o
-1.72
9.89
21.5
33.11
Mar 3, 2026
Refusal Rate
NR Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Refusal Rate
NR Score
GPT-4o
Safety Scaffolding=Bas...
2026.03
0
0.8
GPT-5
Safety Scaffolding=Bas...
2026.03
0
0.68
Qwen2.5-7B
Safety Scaffolding=Bas...
2026.03
13
0.51
Qwen3-4B-Think
Safety Scaffolding=Bas...
2026.03
13
0.66
Qwen2.5-7B
Safety Scaffolding=MOS...
2026.03
15
0.61
GPT-4o
Safety Scaffolding=MOS...
2026.03
19
0.75
Phi-4
Safety Scaffolding=MOS...
2026.03
19
0.75
Qwen3-4B-Think
Safety Scaffolding=MOS...
2026.03
23
0.7
GPT-5
Safety Scaffolding=MOS...
2026.03
24
0.73
Phi-4
Safety Scaffolding=Bas...
2026.03
43
0.77
Feedback
Search any
task
Search any
task