Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmfulness Refusal on HarmBench Risk 3: Harmful Reduction

0Attack Success Rate

M+

-2.816.13553.9Nov 11, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.11
09.0158
2025.11
09.0158
2025.11
09.2847
2025.11
09.2847
2025.11
08.7634
2025.11
08.7634
2025.11
28.18.4592
2025.11
40.38.6734
2025.11
57.48.9421
2025.11
682.2545
2025.11
682.3179
2025.11
702.1823