Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Robustness on WildJailbreak 2k queries (test)
Loading...
498
Number of Explicit Refusals
AUTOSKILL
110.08
210.79
311.5
412.21
Apr 19, 2026
Number of Explicit Refusals
Performance Delta (Δ)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Number of Explicit Refusals
Performance Delta (Δ)
AUTOSKILL
Budget=100
2026.04
498
37
Random
Budget=100
2026.04
461
-
AUTOSKILL
Budget=90
2026.04
452
15
AUTOSKILL
Budget=80
2026.04
437
35
Random
Budget=90
2026.04
437
-
AUTOSKILL
Budget=70
2026.04
402
23
Random
Budget=80
2026.04
402
-
AUTOSKILL
Budget=60
2026.04
388
12
Random
Budget=70
2026.04
379
-
Random
Budget=60
2026.04
376
-
AUTOSKILL
Budget=50
2026.04
141
16
Random
Budget=50
2026.04
125
-
Feedback
Search any
task
Search any
task