Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Text Safety Filter Bypass on NSFW-200 (ShieldLM 1.0 Target Filter)

92.5Bypass Rate

OptJail

92.2493.99595.7597.505May 25, 2025
Updated 8d ago

Evaluation Results

MethodLinks
2025.05
92.5
2025.05
94
2025.05
99