| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Hierarchy | Instruction Hierarchy Tutor Jailbreaks | Pass Rate95 | 4 | |
| Phrase Protection | Instruction Hierarchy (test) | User Message Protection Accuracy97.5 | 4 | |
| System Prompt Extraction | Instruction Hierarchy (test) | Attack Success Rate (Realistic User)99.7 | 4 | |
| Instruction Following Safety | Instruction Hierarchy Phrase and Password Protection | Phrase Protection Adherence (User)91 | 2 |