Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

About

Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser distills the paper into a structured asset library; the (b)Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c)Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs-though visually appealing at first glance-often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g. based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper into a finalized yet editable .pptx poster - all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models. The code and datasets are available at https://github.com/Paper2Poster/Paper2Poster.

Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr• 2025

Related benchmarks

Task	Dataset	Result
VLM Pairwise Preference	ArcBench	Win Rate40	54
Scientific Poster Generation	Paper2Poster	Aesthetic Score (Elemental)3.95	20
Presentation Generation	Presentation Generation Evaluation Set	ROUGE-L76.8	15
VLM-based Q/A Quiz	ArcBench VLM-based Q/A Quiz Open	Accuracy (Story)95.22	15
VLM-as-Judge Evaluation	ArcBench VLM-as-Judge Closed	TQ36.5	15
VLM-based Q/A Quiz	ArcBench VLM-based Q/A Quiz Closed	Story Accuracy87	15
VLM-as-Judge Evaluation	ArcBench VLM-as-Judge Open	TQ65.96	15
Academic Poster Generation	30 top-conference papers	Grid Score4.5	10
Academic Poster Content Evaluation	30 academic posters 1.0 (test)	Layering3.03	10
Reading Comprehension	PaperQuiz	Detail Score63	9

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord