Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adversarial and Jailbreaking Attack Detection on AdvBench

0.9675AUROC

T3-GMM

0.2459480.4332740.62060.807926Feb 4, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.96750.1577
2026.02
0.95780.1731
2026.02
0.89080.8269
2026.02
0.88941
2026.02
0.88221
2026.02
0.8670.6654
2026.02
0.86581
2026.02
0.82410.9327
2026.02
0.8110.75
2026.02
0.78950.9558
2026.02
0.78140.9942
2026.02
0.58940.9827
2026.02
0.56890.975
2026.02
0.44610.9962
2026.02
0.39890.9981
2026.02
0.36810.9942
2026.02
0.35750.9981
2026.02
0.3340.9981
2026.02
0.29631
2026.02
0.27370.9981