| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Policy Violation Detection | DynaBench (test) | F1 Score86 | 12 | |
| Safety Classification | DynaBench (test) | F1 Score75.8 | 10 | |
| Safety Evaluation | DynaBench Augmented (test) | Accuracy72.19 | 7 | |
| Policy-grounded safety evaluation | DynaBench Original | Accuracy73.9 | 5 |