Share your thoughts, 1 month free Claude Pro on usSee more

Misuse Detection on Misuse Categories Psychological Harm (Anti-LGBTQ)

100AUC

Moderator (OpenAI)

Updated 4mo ago

Evaluation Results

Method	Links
Moderator (OpenAI) 2026.01		100	100	0
GAVEL 2026.01		100	100	0
RepBending 2026.01		99	99	1
CAST 2026.01		99	91	17
Llama Guard 4 (Meta) 2026.01		99	99	1
Activation Classifier 2026.01		99	98	3
Circuit Breakers 2026.01		94	95	9
JBShield 2026.01		73	84	1
Perspective (Google) 2026.01		8	62	0