Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

About

In this work, we examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted - by applying a range of strategies built on top of generative AI models. To the best of our knowledge, explanation and intervention have typically been studied separately from detection, which does not reflect real-world conditions. Further, since curating large annotated datasets for meme moderation is prohibitively expensive, we propose a novel framework that leverages task-specific generative multimodal agents and the few-shot adaptability of large multimodal models to cater to different types of memes. We believe this is the first work focused on generalizable hateful meme moderation under limited data conditions, and has strong potential for deployment in real-world production scenarios. Warning: Contains potentially toxic contents.

Naquee Rizwan, Subhankar Swain, Paramananda Bhaskar, Gagan Aryan, Shehryaar Shah Khan, Animesh Mukherjee• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationFHM
Accuracy80.26
16
ClassificationMAMI
Accuracy89.1
16
ExplanationFHM
rgL0.242
10
ExplanationMAMI
rgL0.238
10
InterventionFHM
rgL29
10
InterventionMAMI
rgL0.395
10
Showing 6 of 6 rows

Other info

Follow for update