Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection

About

Hateful meme detection is a challenging multimodal task that requires comprehension of both vision and language, as well as cross-modal interactions. Recent studies have tried to fine-tune pre-trained vision-language models (PVLMs) for this task. However, with increasing model sizes, it becomes important to leverage powerful PVLMs more efficiently, rather than simply fine-tuning them. Recently, researchers have attempted to convert meme images into textual captions and prompt language models for predictions. This approach has shown good performance but suffers from non-informative image captions. Considering the two factors mentioned above, we propose a probing-based captioning approach to leverage PVLMs in a zero-shot visual question answering (VQA) manner. Specifically, we prompt a frozen PVLM by asking hateful content-related questions and use the answers as image captions (which we call Pro-Cap), so that the captions contain information critical for hateful content detection. The good performance of models with Pro-Cap on three benchmarks validates the effectiveness and generalization of the proposed method.

Rui Cao, Ming Shan Hee, Adriel Kuek, Wen-Haw Chong, Roy Ka-Wei Lee, Jing Jiang• 2023

Related benchmarks

TaskDatasetResultRank
Hateful meme classificationHarM (test)
AUC91.03
31
Meme ClassificationMAMI
Accuracy0.736
30
Harmful Meme DetectionFHM
Accuracy74.95
29
Harmful Meme DetectionMAMI
Accuracy73.06
19
ClassificationFHM
Accuracy75.1
16
Harmful Meme DetectionToxiCN
Accuracy75.7
16
ClassificationMAMI
Accuracy73.63
16
Hateful Meme DetectionFHM
AUC80.87
12
Hateful Meme DetectionHarM
AUC90.25
12
Hateful Meme DetectionMAMI
AUC0.8253
12
Showing 10 of 25 rows

Other info

Code

Follow for update