Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CamC2V: Context-aware Controllable Video Generation

About

Recently, image-to-video (I2V) diffusion models have demonstrated impressive scene understanding and generative quality, incorporating image conditions to guide generation. However, these models primarily animate static images without extending beyond their provided context. Introducing additional constraints, such as camera trajectories, can enhance diversity but often degrade visual quality, limiting their applicability for tasks requiring faithful scene representation. We propose CamC2V, a context-to-video (C2V) model that integrates multiple image conditions as context with 3D constraints alongside camera control to enrich both global semantics and fine-grained visual details. This enables more coherent and context-aware video generation. Moreover, we motivate the necessity of temporal awareness for an effective context representation. Our comprehensive study on the RealEstate10K dataset demonstrates a $24.09\%$ (FVD) improvement in visual quality and camera controllability. Our code is publicly available at: https://github.com/LDenninger/CamC2V.

Luis Denninger, Sina Mokhtarzadeh Azar, Juergen Gall• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisRealEstate10K (test)--
8
Camera Trajectory ControlRealEstate10K
Translational Error1.53
4
Video GenerationRealEstate10K
FVD (VideoGPT)53.9
4
Showing 3 of 3 rows

Other info

Follow for update