Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Direct Prompt Injection robustness on Agent Security Bench DPI
Loading...
19
ASR
Phi-4
16.72
32.11
47.5
62.89
Mar 3, 2026
ASR
RR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
RR
Phi-4
Safety Scaffolding=Bas...
2026.03
19
81
GPT-4o
Safety Scaffolding=MOS...
2026.03
21
79
GPT-5
Safety Scaffolding=MOS...
2026.03
26
67
Phi-4
Safety Scaffolding=MOS...
2026.03
28
72
Qwen3-4B-Think
Safety Scaffolding=MOS...
2026.03
29
71
GPT-5
Safety Scaffolding=Bas...
2026.03
42
0
Qwen2.5-7B
Safety Scaffolding=MOS...
2026.03
42
58
Qwen3-4B-Think
Safety Scaffolding=Bas...
2026.03
46
46
Qwen2.5-7B
Safety Scaffolding=Bas...
2026.03
55
42
GPT-4o
Safety Scaffolding=Bas...
2026.03
76
0
Feedback
Search any
task
Search any
task