Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation

About

Humor is a commonly used and intricate human language in daily life. Humor generation, especially in multi-modal scenarios, is a challenging task for large language models (LLMs), which is typically as funny caption generation for images, requiring visual understanding, humor reasoning, creative imagination, and so on. Existing LLM-based approaches rely on reasoning chains or self-improvement, which suffer from limited creativity and interpretability. To address these bottlenecks, we develop a novel LLM-based humor generation mechanism based on a fundamental humor theory, GTVH. To produce funny and script-opposite captions, we introduce a humor-theory-driven multi-role LLM collaboration framework augmented with humor retrieval (HOMER). The framework consists of three LLM-based roles: (1) conflicting-script extractor that grounds humor in key script oppositions, forming the basis of caption generation; (2) retrieval-augmented hierarchical imaginator that identifies key humor targets and expands the creative space of them through diverse associations structured as imagination trees; and (3) caption generator that produces funny and diverse captions conditioned on the obtained knowledge. Extensive experiments on two New Yorker Cartoon benchmarking datasets show that HOMER outperforms state-of-the-art baselines and powerful LLM reasoning strategies on multi-modal humor captioning.

Wenbo Shang, Yuxi Sun, Jing Ma, Xin Huang• 2026

Related benchmarks

TaskDatasetResultRank
Funny Caption GenerationHumor in AI (#Top10)
Top-1 Accuracy66.41
32
Funny Caption GenerationHumor in AI (#200-209)
Recall@173.4
32
Funny Caption GenerationHumor in AI (#1000-1009)
Top-1 Accuracy76.32
32
Funny Caption GenerationElectric sheep High-Humor
Recall@175.53
32
Funny Caption GenerationElectric sheep Low-Humor
Top-1 Accuracy79.45
32
Humor GenerationElectronic Sheep (test)
Visual Understanding Avg Rank1.8
8
Humorous Caption GenerationHumor in AI Dataset (test)
Visual Understanding Rank2.5
8
Meme Caption GenerationImgFlip
pass@183.33
8
Meme GenerationMeme ImgFlip (test)
Pass@183.33
8
Funny Caption GenerationHumor in AI (test)
p-value0.00e+0
7
Showing 10 of 18 rows

Other info

Follow for update