| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Moderation | Safe RLHF AR | F1 Score92 | 8 | |
| Safety Moderation | Safe RLHF EN | F1 Score93 | 8 | |
| Harmful Query Transformation | Safe-RLHF (test) | Effectiveness36 | 4 | |
| Language Model Alignment | Safe RLHF | Win Rate (Helpfulness)80.7 | 3 |