Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

About

We introduce UniReal, a unified framework designed to address various image generation and editing tasks. Existing solutions often vary by tasks, yet share fundamental principles: preserving consistency between inputs and outputs while capturing visual variations. Inspired by recent video generation models that effectively balance consistency and variation across frames, we propose a unifying approach that treats image-level tasks as discontinuous video generation. Specifically, we treat varying numbers of input and output images as frames, enabling seamless support for tasks such as image generation, editing, customization, composition, etc. Although designed for image-level tasks, we leverage videos as a scalable source for universal supervision. UniReal learns world dynamics from large-scale videos, demonstrating advanced capability in handling shadows, reflections, pose variation, and object interaction, while also exhibiting emergent capability for novel applications.

Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao• 2024

Related benchmarks

TaskDatasetResultRank
Instructive image editingEMU Edit (test)
CLIP Image Similarity0.851
46
Instructive image editingMagicBrush (test)
CLIP Image0.903
20
Customized Image GenerationDreamBench
CLIP-I Score0.806
10
Showing 3 of 3 rows

Other info

Follow for update