| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Misinformation Detection | English Dataset | Macro F176.08 | 18 | |
| Text Classification | English Dataset | Accuracy0.9148 | 11 | |
| Jailbreak Safety Evaluation | English dataset Multi-Image | StrongREJECT (Perturbed)14 | 6 | |
| Jailbreak Safety Evaluation | English dataset Single-Image | StrongREJECT (Perturbed)10 | 6 | |
| Jailbreak Safety Evaluation | English dataset Text | StrongREJECT Rate0.01 | 6 |