Content Moderation on Lexica UnsafeBench (test)

65Hate Safety Score

GGuard

Updated 4d ago

Evaluation Results

Method	Links
GGuard 2025.12		65	60.6	70.5	69.8	50.6	60.6	65.1	51.1	73.5	61.5	55.9	62
GPT-4V 2025.12		25	63.5	71.2	61.3	82.7	87.5	70.1	42.2	90.9	62.1	17.1	70.7