Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robustness against priming vulnerability on JBB-Behaviors (test)

0ASR (Guardrail Model)

RA (ours)

-349.22,007.94,3656,722.1Oct 1, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.10
0130
2025.10
00
2025.10
0170
2025.10
0430
2025.10
70100
2025.10
70500
2025.10
100230
2025.10
200670
2025.10
400570
2025.10
5301,230
2025.10
1,1701,930
2025.10
2,7301,770
2025.10
2,8301,430
2025.10
2,8302,330
2025.10
2,8702,600
2025.10
4,3004,830
2025.10
5,8003,870
2025.10
7,4006,100
2025.10
7,6303,830
2025.10
8,7306,070