Share your thoughts, 1 month free Claude Pro on usSee more

Robustness against harmful content generation on LMSYS harmful queries

1Attack Success Rate

RLBF

Updated 5mo ago

Evaluation Results

Method	Links
RLBF 2026.02		1
RLBF 2026.02		1
RLBF 2026.02		2
RLBF 2026.02		2
RLBF 2026.02		2
BSAFE+ 2026.02		14
BSAFE+ 2026.02		14
BSAFE+ 2026.02		15
BSAFE+ 2026.02		16
BSAFE+ 2026.02		17
RL 2026.02		22
RL 2026.02		23
IT 2026.02		24
RL 2026.02		24
IT 2026.02		25
RL 2026.02		25
RL 2026.02		25
IT 2026.02		27
IT 2026.02		28
IT 2026.02		28