ThinkGen: Generalized Thinking for Visual Generation

About

Recent progress in Multimodal Large Language Models (MLLMs) demonstrates that Chain-of-Thought (CoT) reasoning enables systematic solutions to complex understanding tasks. However, its extension to generation tasks remains nascent and limited by scenario-specific mechanisms that hinder generalization and adaptation. In this work, we present ThinkGen, the first think-driven visual generation framework that explicitly leverages MLLM's CoT reasoning in various generation scenarios. ThinkGen employs a decoupled architecture comprising a pretrained MLLM and a Diffusion Transformer (DiT), wherein the MLLM generates tailored instructions based on user intent, and DiT produces high-quality images guided by these instructions. We further propose a separable GRPO-based training paradigm (SepGRPO), alternating reinforcement learning between the MLLM and DiT modules. This flexible design enables joint training across diverse datasets, facilitating effective CoT reasoning for a wide range of generative scenarios. Extensive experiments demonstrate that ThinkGen achieves robust, state-of-the-art performance across multiple generation benchmarks. Code is available: https://github.com/jiaosiyuu/ThinkGen

Siyu Jiao, Yiheng Lin, Yujie Zhong, Qi She, Wei Zhou, Xiaohan Lan, Zilong Huang, Fei Yu, Yingchen Yu, Yunqing Zhao, Yao Zhao, Yunchao Wei• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score89	517
Text-to-Image Generation	DPG-Bench	Overall Score85.87	451
Reasoning-informed Image Editing	RISE-Bench	Temporal Score16.4	64
Reasoning Image Editing	RiseBench 1.0 (test)	Temporal Score16.4	30
Reflective Visual Generation	R3-Bench	Color (Sref)0.78	18
Reasoning Generation	WISE 1.0 (test)	Overall Score76	17
Image Editing	ImgEdit (test)	Add Score4.75	16
Visual Reasoning	TSP	Accuracy (Scale 12)0.00e+0	10
Visual Reasoning	Sudoku	Accuracy (Scale 40)0.00e+0	10
Visual Reasoning	Maze	Accuracy (Scale 8)0.00e+0	10

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord