Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Refusal behavior defense on WizardLM (test)

90.4BadNet CACC

Inst_clean

45.05656.82868.680.372Jan 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
90.42.890.48.790.43.290.41.4
2026.01
893.287.673.483.54.188.10.5
2026.01
85.316.186.787.278.48389.91.4
2026.01
81.2088.91.8780.976.60.5
2026.01
70.242.287.688.57885.858.335.8
2026.01
69.360.68585.87693.659.455
2026.01
57.83.257.83357.85.557.80.9
2026.01
551.853.21.852.83.748.22.3
2026.01
52.369.351.882.1508351.427.5
2026.01
50.587.652.386.943.185.834.950.5
2026.01
48.687.648.687.24586.716.587
2026.01
46.84548.688.548.285.813.482.6