Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful Refusal on WJB
Loading...
20.7
ASR
ELS
18.748
31.924
45.1
58.276
Oct 9, 2025
ASR
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
ELS
Base Model=Llama-3.1-8...
2025.10
20.7
Self-Play + SFT
Base Model=Llama-3.1-8...
2025.10
24
Defender-Only + SFT
Base Model=Llama-3.1-8...
2025.10
43.2
Self-Play
Base Model=Llama-3.1-8...
2025.10
53.6
SFT
Base Model=Llama-3.1-8...
2025.10
60
Llama-3.1-8B-IT
Model Status=Base Mode...
2025.10
67.5
Defender-Only
Base Model=Llama-3.1-8...
2025.10
69.5
Feedback
Search any
task
Search any
task