Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ViPE: Video Pose Engine for 3D Geometric Perception

About

Accurate 3D geometric perception is an important prerequisite for a wide range of spatial AI systems. While state-of-the-art methods depend on large-scale training data, acquiring consistent and precise 3D annotations from in-the-wild videos remains a key challenge. In this work, we introduce ViPE, a handy and versatile video processing engine designed to bridge this gap. ViPE efficiently estimates camera intrinsics, camera motion, and dense, near-metric depth maps from unconstrained raw videos. It is robust to diverse scenarios, including dynamic selfie videos, cinematic shots, or dashcams, and supports various camera models such as pinhole, wide-angle, and 360{\deg} panoramas. We have benchmarked ViPE on multiple benchmarks. Notably, it outperforms existing uncalibrated pose estimation baselines by 18%/50% on TUM/KITTI sequences, and runs at 3-5FPS on a single GPU for standard input resolutions. We use ViPE to annotate a large-scale collection of videos. This collection includes around 100K real-world internet videos, 1M high-quality AI-generated videos, and 2K panoramic videos, totaling approximately 96M frames -- all annotated with accurate camera poses and dense depth maps. We open-source ViPE and the annotated dataset with the hope of accelerating the development of spatial AI systems.

Jiahui Huang, Qunjie Zhou, Hesam Rabeti, Aleksandr Korovko, Huan Ling, Xuanchi Ren, Tianchang Shen, Jun Gao, Dmitry Slepichev, Chen-Hsuan Lin, Jiawei Ren, Kevin Xie, Joydeep Biswas, Laura Leal-Taixe, Sanja Fidler• 2025

Related benchmarks

TaskDatasetResultRank
Camera TrackingBONN dynamic sequences
Balloon Error3.3
38
Camera pose estimationOxford Spires sparse setting
AUC@1545.35
18
SLAMTUM-RGBD
XYZ Error (fr3, w)2.4
9
TrackingTUM RGB-D (dynamic sequences)
ATE RMSE (ws) [cm]0.5
8
TrackingWild-SLAM MoCap Dataset
ATE RMSE (ANYmal1)0.4
8
Tracking7-scenes static
ATE RMSE0.05
8
TrackingTUM RGB-D static
ATE RMSE0.065
8
TrackingSintel low-motion
ATE RMSE0.028
7
Showing 8 of 8 rows

Other info

Follow for update