Prompting for Multimodal Hateful Meme Classification

About

Hateful meme classification is a challenging multimodal task that requires complex reasoning and contextual background knowledge. Ideally, we could leverage an explicit external knowledge base to supplement contextual and cultural information in hateful memes. However, there is no known explicit external knowledge base that could provide such hate speech contextual information. To address this gap, we propose PromptHate, a simple yet effective prompt-based model that prompts pre-trained language models (PLMs) for hateful meme classification. Specifically, we construct simple prompts and provide a few in-context examples to exploit the implicit knowledge in the pre-trained RoBERTa language model for hateful meme classification. We conduct extensive experiments on two publicly available hateful and offensive meme datasets. Our experimental results show that PromptHate is able to achieve a high AUC of 90.96, outperforming state-of-the-art baselines on the hateful meme classification task. We also perform fine-grained analyses and case studies on various prompt settings and demonstrate the effectiveness of the prompts on hateful meme classification.

Rui Cao, Roy Ka-Wei Lee, Wen-Haw Chong, Jing Jiang• 2023

Related benchmarks

Task	Dataset	Result
Hateful Meme Detection	Hateful Memes (test)	AUROC0.8145	67
Meme Classification	HatefulMemes	Accuracy72.98	60
Harmful Meme Detection	FHM (test)	Accuracy72.98	51
Harmful Meme Detection	FHM	Macro-F171.83	49
Harmful Meme Detection	MAMI	Accuracy70.4	33
Hateful meme classification	HarM (test)	AUC90.96	31
Binary Hate Detection	FHM (test)	Accuracy67.82	25
Fine-grained Hate Categorization	H-VLI (Normal)	Accuracy73.36	24
Binary Hate Detection	H-VLI (Normal)	Accuracy69.2	24
Binary Hate Detection	H-VLI (Hard)	Accuracy37.87	24

Showing 10 of 35 rows

Other info

Code

Follow for update

@wizwand_team Discord