FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning

About

We introduce FaceCam, a system that generates video under customizable camera trajectories for monocular human portrait video input. Recent camera control approaches based on large video-generation models have shown promising progress but often exhibit geometric distortions and visual artifacts on portrait videos due to scale-ambiguous camera representations or 3D reconstruction errors. To overcome these limitations, we propose a face-tailored scale-aware representation for camera transformations that provides deterministic conditioning without relying on 3D priors. We train a video generation model on both multi-view studio captures and in-the-wild monocular videos, and introduce two camera-control data generation strategies: synthetic camera motion and multi-shot stitching, to exploit stationary training cameras while generalizing to dynamic, continuous camera trajectories at inference time. Experiments on Ava-256 dataset and diverse in-the-wild videos demonstrate that FaceCam achieves superior performance in camera controllability, visual quality, identity and motion preservation.

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu• 2026

Related benchmarks

Task	Dataset	Result	Rank
Camera-controlled Video Generation	Ava-256 Static camera setting	PSNR15.85		4
Controllable Portrait Video Generation	In-the-wild videos	Camera Correctness100		4

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord