Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Response Harmfulness Detection Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Response Harmfulness DetectionResponse Harmfulness Detection Benchmarks (HarmBench, SafeRLHF, BeaverTails, XSTest, WildGuard)
Macro Avg F10.8333
21
Showing 1 of 1 rows