Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

About

Although existing unified models achieve strong performance in vision-language understanding and text-to-image generation, they remain limited in addressing image perception and manipulation -- capabilities increasingly demanded in practical applications. Recently, OpenAI introduced the powerful GPT-4o-Image model, which showcases advanced capabilities in comprehensive image perception and manipulation, sparking widespread interest. Through carefully designed experiments, we observe that GPT-4o-Image likely relies on semantic encoders rather than VAEs for feature extraction, despite VAEs being commonly regarded as crucial for image manipulation tasks. Inspired by this insight, we propose UniWorld-V1, a unified generative framework built upon semantic features extracted from powerful multimodal large language models and contrastive semantic encoders. Using only 2.7M training data, UniWorld-V1 achieves impressive performance across diverse tasks, including image understanding, generation, manipulation, and perception. We fully open-source the UniWorld-V1 framework, including model weights, training and evaluation scripts, and datasets to promote reproducibility and further research.

Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, Yatian Pang, Li Yuan• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal UnderstandingMMBench--
637
Multimodal UnderstandingMM-Vet
MM-Vet Score67.1
531
Text-to-Image GenerationGenEval
Overall Score84
506
Multimodal ReasoningMM-Vet
MM-Vet Score67.1
431
Text-to-Image GenerationGenEval
GenEval Score84
360
Text-to-Image GenerationDPG-Bench
Overall Score81.38
265
Text-to-Image GenerationGenEval (test)
Two Obj. Acc93
221
Text-to-Image GenerationGenEval
Overall Score84
218
Image EditingImgEdit-Bench
Overall Score3.26
191
Text-to-Image GenerationDPG
Overall Score81.38
172
Showing 10 of 103 rows
...

Other info

Follow for update