| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack Evaluation | Five Safety Benchmarks AdvBench, HarmBench, HarmfulQ, JBBench, StrongReject | ASR7.69 | 6 | |
| Safety Evaluation | Safety Benchmarks Overall | Cost per Accuracy Point ($)0.001 | 4 | |
| Safety Evaluation | Safety Benchmarks Aggregate (test) | Generation Quality (Std Prefix)73.6 | 4 | |
| Safety Evaluation | Five Safety Benchmarks direct_q | ASR0.02 | 3 |