Unified Thinker: A General Reasoning Modular Core for Image Generation

About

Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning--execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.

Sashuai Zhou, Qiang Zhou, Jijin Hu, Hanqing Yang, Yue Cao, Junpeng Ma, Yinchao Ma, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng, Zhou Zhao• 2026

Related benchmarks

Task	Dataset	Result
Reasoning-based text-to-image generation	WISE	Overall Score74	70
Reasoning-informed Image Editing	RISE-Bench	Temporal Score32.9	64
Reasoning-aware Image Generation	RiseBench 1.0 (test)	Instruction Reasoning0.619	19
Text-to-Image Generation	PRISM	Alignment Score83.2	14
Instruction-based Image Editing	GEditBench English v1	G_SC8.17	14

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord