| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Streaming Safety Detection | Safe-RLHF | Det@196.43 | 8 | |
| Safety Moderation | Safe RLHF AR | F1 Score92 | 8 | |
| Safety Moderation | Safe RLHF EN | F1 Score93 | 8 | |
| Full-response Safety Guardrail Classification | Safe-RLHF (test) | F1 Score93.2 | 7 | |
| Harmful Query Transformation | Safe-RLHF (test) | Effectiveness36 | 4 | |
| Language Model Alignment | Safe RLHF | Win Rate (Helpfulness)80.7 | 3 |