On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation

About

Humor is a commonly used and intricate human language in daily life. Humor generation, especially in multi-modal scenarios, is a challenging task for large language models (LLMs), which is typically as funny caption generation for images, requiring visual understanding, humor reasoning, creative imagination, and so on. Existing LLM-based approaches rely on reasoning chains or self-improvement, which suffer from limited creativity and interpretability. To address these bottlenecks, we develop a novel LLM-based humor generation mechanism based on a fundamental humor theory, GTVH. To produce funny and script-opposite captions, we introduce a humor-theory-driven multi-role LLM collaboration framework augmented with humor retrieval (HOMER). The framework consists of three LLM-based roles: (1) conflicting-script extractor that grounds humor in key script oppositions, forming the basis of caption generation; (2) retrieval-augmented hierarchical imaginator that identifies key humor targets and expands the creative space of them through diverse associations structured as imagination trees; and (3) caption generator that produces funny and diverse captions conditioned on the obtained knowledge. Extensive experiments on two New Yorker Cartoon benchmarking datasets show that HOMER outperforms state-of-the-art baselines and powerful LLM reasoning strategies on multi-modal humor captioning.

Wenbo Shang, Yuxi Sun, Jing Ma, Xin Huang• 2026

Related benchmarks

Task	Dataset	Result
Funny Caption Generation	Humor in AI (#Top10)	Top-1 Accuracy66.41	32
Funny Caption Generation	Humor in AI (#200-209)	Recall@173.4	32
Funny Caption Generation	Humor in AI (#1000-1009)	Top-1 Accuracy76.32	32
Funny Caption Generation	Electric sheep High-Humor	Recall@175.53	32
Funny Caption Generation	Electric sheep Low-Humor	Top-1 Accuracy79.45	32
Humor Generation	Electronic Sheep (test)	Visual Understanding Avg Rank1.8	8
Humorous Caption Generation	Humor in AI Dataset (test)	Visual Understanding Rank2.5	8
Meme Caption Generation	ImgFlip	pass@183.33	8
Meme Generation	Meme ImgFlip (test)	Pass@183.33	8
Funny Caption Generation	Humor in AI (test)	p-value0.00e+0	7

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord