Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Adversarial Robustness on Skill-Composed 2k queries (test)
Loading...
169
Explicit Refusals Count
Random
138.16
346.33
554.5
762.67
Apr 19, 2026
Explicit Refusals Count
Delta (Δ)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Explicit Refusals Count
Delta (Δ)
Random
Budget=50
2026.04
169
-
AUTOSKILL
Budget=50
2026.04
174
5
Random
Budget=60
2026.04
363
-
AUTOSKILL
Budget=60
2026.04
382
19
Random
Budget=70
2026.04
515
-
AUTOSKILL
Budget=70
2026.04
540
25
Random
Budget=80
2026.04
660
-
AUTOSKILL
Budget=80
2026.04
690
30
Random
Budget=90
2026.04
795
-
AUTOSKILL
Budget=90
2026.04
824
29
Random
Budget=100
2026.04
905
-
AUTOSKILL
Budget=100
2026.04
940
35
Feedback
Search any
task
Search any
task