MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
About
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods. Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Novel View Synthesis | RealEstate10K | PSNR16.3 | 116 | |
| Novel View Synthesis | DL3DV (test) | PSNR13.4003 | 54 | |
| 3D Scene Generation | WorldScore | Camera Control58.65 | 33 | |
| Novel View Synthesis | CO3D | PSNR16.16 | 24 | |
| Novel View Synthesis | RealEstate10K Hard | PSNR16.29 | 20 | |
| Novel View Synthesis | RealEstate10K Easy | PSNR16.31 | 20 | |
| View Synthesis | Tanks&Temples | PSNR13.02 | 15 | |
| Novel View Synthesis | RealEstate10K Medium | PSNR12.0674 | 14 | |
| Single-view Novel View Synthesis | DL3DV Short-term (50th frame) | PSNR13.34 | 13 | |
| Single-view Novel View Synthesis | RealEstate10K Short-term, 50th frame 84 (test) | PSNR14.14 | 13 |