Robustness against harmful content generation

Benchmarks

Dataset Name	SOTA Method	Metric	Trend
LMSYS harmful queries	RLBF	Attack Success Rate1		20	4mo ago
LMSYS-MF attacks		Attack Success Rate0.81		20	4mo ago

Showing 2 of 2 rows