| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hateful meme classification | HarM (test) | AUC91.03 | 31 | |
| Jailbreak Attempt Forgetting | Harm Jailbreak 2 | ASR71.4 | 28 | |
| Jailbreak Attempt Forgetting | JB-1 Jailbreak Harm-1 | ASR (%)73.1 | 28 | |
| Harmful Question Forgetting | Harm-2 GPTFUZZER WildAttack | Attack Success Rate (ASR)0 | 28 | |
| Harmful Question Forgetting | Harm-1 GPTFUZZER WildAttack | ASR61 | 28 | |
| Scrubbing Attack | Harm | AUC80 | 20 | |
| Spoofing Attack Detection | Harm | WCS8.933 | 18 | |
| Harmful Meme Detection | HarM | Accuracy83.82 | 13 | |
| Hateful Meme Detection | HarM | AUC90.25 | 12 | |
| Harmful meme detection | Harm-C (test) | Accuracy87 | 10 |