Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Public Prompt Harmfulness Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Prompt Harmfulness ClassificationPublic Prompt Harmfulness Benchmarks (ToxicChat, OpenAI Moderation, AegisSafetyTest, SimpleSafetyTests, HarmBenchPrompt)
ToxiC Score74.7
19
Showing 1 of 1 rows