Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Indirect Prompt Injection Attack Success Evaluation on Agent Action Goal-Adjacent 2
Loading...
85
IRany
GPT-5.4
4.92
25.71
46.5
67.29
May 14, 2026
IRany
RRsem
RRemm
URemm
URctx
E2Esem
E2Eemm
E2Ectx
Updated 16d ago
Evaluation Results
Method
Method
Links
IRany
RRsem
RRemm
URemm
URctx
E2Esem
E2Eemm
E2Ectx
GPT-5.4
2026.05
85
100
89.4
85.5
83.5
72.7
65
71
GPT-5.5
2026.05
85
100
97.6
81.9
77.6
69.6
67.9
66
Kimi-K2.6
2026.05
84
100
97.6
81.7
78.6
68.6
67
66
Gemini-3.1
2026.05
76
100
97.4
94.6
88.2
71.9
70
67
Sonnet-4.6
2026.05
8
100
100
100
75
8
8
6
Feedback
Search any
task
Search any
task