Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Toxic Text Generation on RealToxicityPrompts malicious
Loading...
14.8
Attack Success Rate (ASR)
SAD
13.856
20.228
26.6
32.972
Apr 28, 2026
Attack Success Rate (ASR)
Updated 22d ago
Evaluation Results
Method
Method
Links
Attack Success Rate (ASR)
SAD
Model=LLaDA, Negation...
2026.04
14.8
SAD
Model=LLaDA, Negation...
2026.04
16
SAD
Model=LLaDA, Negation...
2026.04
17.6
Base
Model=LLaDA, Negation...
2026.04
19.8
SAD
Model=MDLM, Negation S...
2026.04
32.6
SAD
Model=MDLM, Negation S...
2026.04
32.8
SAD
Model=MDLM, Negation S...
2026.04
33.4
Base
Model=MDLM, Negation S...
2026.04
38.4
Feedback
Search any
task
Search any
task