Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Refusal Evaluation on WildJailbreak Adversarial Harmful

89.45Refusal Rate

Low-Rank Combination

17.74236.358554.97573.5915Mar 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
89.45
2026.03
87.65
2026.03
78.8
2026.03
77.4
2026.03
49.1
2026.03
44.6
2026.03
20.5