Mapping Memes to Words for Multimodal Hateful Meme Classification
About
Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions. However, some memes take a malicious turn, promoting hateful content and perpetuating discrimination. Detecting hateful memes within this multimodal context is a challenging task that requires understanding the intertwined meaning of text and images. In this work, we address this issue by proposing a novel approach named ISSUES for multimodal hateful meme classification. ISSUES leverages a pre-trained CLIP vision-language model and the textual inversion technique to effectively capture the multimodal semantic content of the memes. The experiments show that our method achieves state-of-the-art results on the Hateful Memes Challenge and HarMeme datasets. The code and the pre-trained models are publicly available at https://github.com/miccunifi/ISSUES.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hateful meme classification | HarM (test) | AUC85.51 | 31 | |
| Meme Classification | HarMeme | Accuracy81.6 | 30 | |
| Hate Detection | PrideMM (test) | Accuracy74.68 | 18 | |
| Hateful meme classification | HarMeme (test) | Accuracy81.64 | 15 | |
| Humor Classification | PrideMM (test) | Accuracy78.95 | 10 | |
| Target identification | PrideMM (test) | Accuracy61.25 | 10 | |
| Stance Classification | PrideMM (test) | Accuracy59.39 | 10 | |
| Hateful meme classification | HarMeme | Accuracy81.31 | 10 | |
| Hateful Meme Detection | Harm-C binary (test) | Accuracy81.31 | 10 |