| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Detection | GoalFrameBench | Accuracy94 | 24 | |
| Jailbreak Detection | GoalFrameBench (seed prompts) | Accuracy97 | 16 | |
| Jailbreak Detection | GoalFrameBench harmful Llama2-7B 2025 (seed prompts) | Accuracy95 | 5 | |
| Jailbreak Detection | GoalFrameBench harmful Llama3-8B 2025 (seed prompts) | Accuracy0.96 | 5 | |
| Jailbreak Detection | GoalFrameBench harmful (Vicuna-7B) 2025 (seed prompts) | Accuracy97 | 4 | |
| Jailbreak Detection | GoalFrameBench harmful seed prompts (Vicuna-13B) 2025 | Accuracy89 | 3 |