Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildPose: A Unified Framework for Robust Pose Estimation in the Wild

About

Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask detector that uses multi-level 3D-aware features from the same backbone. Extensive experiments show WildPose consistently outperforms prior methods across dynamic (Wild-SLAM, Bonn), static (TUM, 7-Scenes), and low-ego-motion (Sintel) benchmarks.

Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni• 2026

Related benchmarks

TaskDatasetResultRank
Camera TrackingBONN dynamic sequences
Balloon Error2.6
38
TrackingTUM RGB-D static
ATE RMSE0.027
8
TrackingWild-SLAM MoCap Dataset
ATE RMSE (ANYmal1)0.2
8
Tracking7-scenes static
ATE RMSE0.049
8
TrackingTUM RGB-D (dynamic sequences)
ATE RMSE (ws) [cm]0.6
8
TrackingSintel low-motion
ATE RMSE0.017
7
Depth EstimationBonn RGB-D Dynamic
Abs. Rel. Error0.12
6
Showing 7 of 7 rows

Other info

Follow for update