Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

English dataset

Benchmarks

Task NameDataset NameSOTA ResultTrend
Misinformation DetectionEnglish Dataset
Macro F176.08
18
Text ClassificationEnglish Dataset
Accuracy0.9148
11
Jailbreak Safety EvaluationEnglish dataset Multi-Image
StrongREJECT (Perturbed)14
6
Jailbreak Safety EvaluationEnglish dataset Single-Image
StrongREJECT (Perturbed)10
6
Jailbreak Safety EvaluationEnglish dataset Text
StrongREJECT Rate0.01
6
Showing 5 of 5 rows