Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WJB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Harmful prompt detectionWJB
F1 Score97.55
17
Harmful RefusalWJB
ASR0.2
16
Jailbreak DetectionWJB
ACC93.26
13
Showing 3 of 3 rows