Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DynaBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Policy Violation DetectionDynaBench (test)
F1 Score86
12
Safety ClassificationDynaBench (test)
F1 Score75.8
10
Safety EvaluationDynaBench Augmented (test)
Accuracy72.19
7
Policy-grounded safety evaluationDynaBench Original
Accuracy73.9
5
Showing 4 of 4 rows