Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Robustness against harmful content generation on LMSYS harmful queries
Loading...
1
Attack Success Rate
RLBF
-0.08
7.21
14.5
21.79
Feb 9, 2026
Attack Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Attack Success Rate
RLBF
Model Family=LLaMA 3,...
2026.02
1
RLBF
Model Family=LLaMA 3,...
2026.02
1
RLBF
Model Family=Gemma 2,...
2026.02
2
RLBF
Model Family=Gemma 2,...
2026.02
2
RLBF
Model Family=LLaMA 3,...
2026.02
2
BSAFE+
Model Family=Gemma 2,...
2026.02
14
BSAFE+
Model Family=LLaMA 3,...
2026.02
14
BSAFE+
Model Family=Gemma 2,...
2026.02
15
BSAFE+
Model Family=LLaMA 3,...
2026.02
16
BSAFE+
Model Family=LLaMA 3,...
2026.02
17
RL
Model Family=LLaMA 3,...
2026.02
22
RL
Model Family=Gemma 2,...
2026.02
23
IT
Model Family=LLaMA 3,...
2026.02
24
RL
Model Family=Gemma 2,...
2026.02
24
IT
Model Family=Gemma 2,...
2026.02
25
RL
Model Family=LLaMA 3,...
2026.02
25
RL
Model Family=LLaMA 3,...
2026.02
25
IT
Model Family=LLaMA 3,...
2026.02
27
IT
Model Family=Gemma 2,...
2026.02
28
IT
Model Family=LLaMA 3,...
2026.02
28
Feedback
Search any
task
Search any
task