| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | WizardLM (test) | Score6.87 | 13 | |
| Refusal behavior defense | WizardLM (test) | BadNet CACC90.4 | 12 | |
| Toxic behavior defense | WizardLM (test) | BadNet CACC0.904 | 12 | |
| Instruction Following | WizardLM low-resource | Win Rate (bn)62.8 | 7 | |
| Instruction Following Evaluation | WizardLM | Score72.06 | 5 | |
| Generation | WizardLM (test) | LLM-as-a-Judge Score48.37 | 2 |