Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Defense against Indirect Prompt Injection on Filtered QA dataset

97.65ASR (Naive)

Reminder

-3.85422.49848.8575.202Nov 1, 2024
Updated 4d ago

Evaluation Results

MethodLinks
2024.11
97.6597.95100100100
2024.11
95.5595.7599.95100100
2024.11
94.3596.45100100100
2024.11
94.3596.45100100100
2024.11
92.4595.9100100100
2024.11
85.991.181.795.2592.3
2024.11
79.0577.371.7584.3580.65
2024.11
77.888.8599.3599.7100
2024.11
76.758589.7591.788.75
2024.11
60.1568.3588.184.784.85
2024.11
10.5553.3588.2575.386
2024.11
10.5539.967.537.8551.2
2024.11
8.832.8576.3574.4556.6
2024.11
6.953580.164.4562.75
2024.11
4.86.151434.234.6
2024.11
2.53.0522.93.359.55
2024.11
2.233.7583.3567.477.75
2024.11
1.752.458.750.80.6
2024.11
1.451.70.750.684.95
2024.11
1.252.71.050.91.65
2024.11
0.850.70.80.954.1
2024.11
0.459.3549.557.321.25
2024.11
0.30.70.550.450.3
2024.11
0.251.71.050.551.45
2024.11
0.250.30.350.451.1
2024.11
0.11.817.70.050.1
2024.11
0.10.250.20.150.05
2024.11
0.050.350.30.11.35
2024.11
0.050.050.30.050.05
2024.11
0.050.250.20.150.05