The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
About
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hateful Meme Detection | Hateful Memes (test) | AUROC0.8265 | 67 | |
| Hateful meme classification | HarM (test) | AUC83.21 | 31 | |
| Hateful Meme Detection | Hateful Memes (val) | AUROC73.97 | 22 | |
| Content Moderation | Hateful Memes seen (test) | AUC-ROC82.7 | 7 |