Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful Question Forgetting on Harm-2 GPTFUZZER WildAttack

0Attack Success Rate (ASR)

Unlearning

-2.26813.04128.3543.659Apr 7, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0
2026.04
0.1
2026.04
0.5
2026.04
0.9
2026.04
0.9
2026.04
1
2026.04
2.3
2026.04
3.2
2026.04
4.6
2026.04
4.6
2026.04
4.6
2026.04
6.5
2026.04
7.8
2026.04
8.8
2026.04
24.9
2026.04
56.7