PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

About

We propose PostCam, a streamlined framework for novel-view video generation that achieves superior detail preservation and precise camera trajectory editing in dynamic scenes. Current methods often struggle with a trade-off between pose-based control, which lacks visual detail, and rendering-based guidance, which is overly sensitive to geometric accuracy. Despite recent hybrid attempts, achieving precise motion and visual consistency remains challenging due to the lack of effective cross-modal alignment. We argue that robust control stems from the deep alignment of multimodal signals rather than increased input complexity. Our core contribution is the Query-Shared Cross-Attention mechanism, which projects 6-DoF poses and rendered features into a unified latent space. This allows the model to spontaneously achieve intrinsic consistency between motion cues and pixel-level guidance during denoising. Experiments demonstrate that PostCam maintains high-fidelity visual details while outperforming state-of-the-art methods by 20% in trajectory precision, exhibiting superior robustness in complex dynamic scenes. Our project webpage is publicly available at: https://cccqaq.github.io/PostCam.github.io/

Yipeng Chen, Zhichao Ye, Zhenzhou Fang, Xinyu Chen, Xiaoyu Zhang, Jialing Liu, Nan Wang, Guofeng Zhang, Haomin Liu• 2025

Related benchmarks

Task	Dataset	Result
Camera-conditioned video generation	Synthetic dataset	Rotation Error0.0495	4
Camera-conditioned video generation	User Study (MOS)	Visual Quality4.42	4
Camera-conditioned video generation	OpenVid	Rotation Error (Rot Err)0.0501	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord