Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adversarial and Jailbreaking Attack Detection on MaliciousInstruct

0.8825AUROC

OpenAI Omni

0.1145640.3139320.51330.712668Feb 4, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.88250.92
2026.02
0.87181
2026.02
0.86170.9982
2026.02
0.8510.71
2026.02
0.85071
2026.02
0.85010.78
2026.02
0.8280.59
2026.02
0.79570.97
2026.02
0.77451
2026.02
0.75860.68
2026.02
0.68281
2026.02
0.59541
2026.02
0.57910.93
2026.02
0.38691
2026.02
0.29941
2026.02
0.2581
2026.02
0.22521
2026.02
0.19131
2026.02
0.16221
2026.02
0.14411