Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

About

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation. Our project page is at https://dreamvideo-t2v.github.io.

Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan• 2023

Related benchmarks

TaskDatasetResultRank
Motion-aware customized video generationCustomized Video Generation Evaluation Set
R-CLIP65.6
8
Subject and Motion CustomizationHuman Evaluation 50 groups: 5 motion patterns and 10 subjects
Text Alignment82.8
6
Video Subject CustomizationUser Study (120 videos, 15 subjects, 8 prompts each)
Text Alignment70.4
6
Subject CustomizationSubject Customization Evaluation Set
CLIP Similarity (Text)0.295
5
Video PersonalizationVBench
Subject Consistency0.9591
5
Visual Concept CompositionDAVIS and Internet (test)
CLIP-T Score27.43
5
Personalized Video GenerationPersonalized Video Generation Dataset
IDINO32.2
5
Customized Video GenerationCustomized Video Generation 20 subjects and 30 motions (test)
CLIP Text Alignment Score31.4
4
Video Motion CustomizationCustom Motion Customization Dataset 20 motion patterns 6 text prompts (test)
Text Alignment73.8
4
Customized Video GenerationCustomized Video Generation Dataset (test)
CLIP-T0.298
4
Showing 10 of 12 rows

Other info

Code

Follow for update