The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

About

This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine• 2020

Related benchmarks

Task	Dataset	Result
Hateful Meme Detection	Hateful Memes (test)	AUROC0.8265	67
Meme Classification	HatefulMemes	Accuracy69.47	60
Hateful meme classification	HarM (test)	AUC83.21	31
Binary Hate Detection	FHM (test)	Accuracy52.34	25
Hateful Meme Detection	Hateful Memes (val)	AUROC73.97	22
Content Moderation	Hateful Memes seen (test)	AUC-ROC82.7	7
Classification	Hateful Memes (test)	--	4

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord