Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Prompt Injection Defense on LLMail full 22,899-attack pool
Loading...
0
ASR
Pipeline
-1.2
6.9
15
23.1
Mar 13, 2026
ASR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
Pipeline
Model=gpt-5-mini
2026.03
0
Two-Agent
Model=gpt-5-mini
2026.03
0.009
JSON Validator
Model=gpt-5-mini
2026.03
0.4
All combined
Model=GPT-4o mini
2026.03
2
Baseline
Model=gpt-5-mini
2026.03
2.83
TaskTracker
Model=GPT-4o mini
2026.03
5
LLM-as-judge
Model=GPT-4o mini
2026.03
8
PromptShield
Model=GPT-4o mini
2026.03
10
Spotlighting
Model=GPT-4o mini
2026.03
15
None (original)
Model=Phi-3 / GPT-4o mini
2026.03
30
Feedback
Search any
task
Search any
task