| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | SafetyBench en | Avg Score81.2 | 25 | |
| Safety Evaluation | SafetyBench zh | Avg Score83.2 | 21 | |
| Safety Evaluation | SafetyBench (test) | Accuracy81.321 | 9 | |
| Jailbreak Attack | SafetyBench LLaVA-2 Integrated from AdvBench (test) | Illegal Activity Success Rate83.73 | 4 | |
| Jailbreak Attack | SafetyBench MiniGPT-4 Integrated from AdvBench (test) | IA (Illegal Activity)0.7024 | 4 |