| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | Safety Evaluation Suite HarmBench, StrongReject, WildJailbreak, XSTest | HarmBench Score68.44 | 28 | |
| Safety | Safety Evaluation Suite (Salad-Bench, WildJailbreak, JailbreakBench, WildChat, WildGuard) | Safety Rate (S.R.)100 | 24 | |
| Safety | Safety Evaluation Suite | Score0.911 | 9 |