SOTA Response Harmfulness Detection benchmarks and papers with code

Benchmarks

Dataset Name	SOTA Method	Metric
HarmBench	LLaMA Guard 3	F1 Score98.94	100	1mo ago
XSTEST-RESP	GuardReasoner-Omni 4B	Response Harmfulness F195.48	76	1mo ago
BeaverTails	BeaverDam 7B	F1 Score89.9	59	1mo ago
SafeRLHF	BeaverDam	F1 Score72.1	41	1mo ago
Response Harmfulness Detection Benchmarks (HarmBench, SafeRLHF, BeaverTails, XSTest, WildGuard)	COLAGUARD	Macro Avg F10.8333	21	1mo ago
HarmTextVideo	GuardReasoner-Omni 4B	F1 Score95.25	5	4mo ago
SPA-VL-Eval	GuardReasoner-Omni 2B	F1 Score74.73	5	4mo ago

Showing 7 of 7 rows