VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction

About

Video stabilization aims to mitigate camera shake but faces a fundamental trade-off between geometric robustness and full-frame consistency. While 2D methods suffer from aggressive cropping, 3D techniques are often undermined by fragile optimization pipelines that fail under extreme motions. Novel view synthesis models suffer from structural artifacts and scale blindness. To bridge this gap, we propose VS3R, a framework that synergizes feed-forward 3D reconstruction with generative video diffusion. Our pipeline jointly estimates camera parameters, depth, and masks to ensure all-scenario reliability, and introduces a Hybrid Stabilized Rendering (HSR) module that fuses semantic and geometric cues to preliminarily address parallax occlusions caused by pose transformations while maintaining dynamic-static consistency. Finally, a Video Stabilization-Driven Diffusion Model (VSDM) leverages contextual information to restore disoccluded regions, jointly optimizing texture and temporal consistency. Collectively, VS3R achieves high-fidelity, full-frame stabilization across diverse camera models and significantly outperforms state-of-the-art methods in robustness and visual quality.

Muhua Zhu, Xinhao Jin, Xinping Wang, Yu Zhang, Yifei Xue, Tie Ji, Yizhen Lao• 2026

Related benchmarks

Task	Dataset	Result
Video Stabilization	NUS Average of All Categories	Cropping100	8
Video Stabilization	NUS Running	Cropping100	8
Video Stabilization	NUS (Crowd)	Cropping Score99.9	8
Video Stabilization	NUS Parallax	Cropping100	8
Video Stabilization	NUS Zooming	Cropping Quality100	8
Video Stabilization	NUS (Rotation)	Cropping Quality99.9	8
Video Stabilization	NUS (Regular)	Cropping Quality100	8

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord