Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

About

Recent advances in DUSt3R have enabled robust estimation of dense point clouds and camera parameters of static scenes, leveraging Transformer network architectures and direct supervision on large-scale 3D datasets. In contrast, the limited scale and diversity of available 4D datasets present a major bottleneck for training a highly generalizable 4D model. This constraint has driven conventional 4D methods to fine-tune 3D models on scalable dynamic video data with additional geometric priors such as optical flow and depths. In this work, we take an opposite path and introduce Easi3R, a simple yet efficient training-free method for 4D reconstruction. Our approach applies attention adaptation during inference, eliminating the need for from-scratch pre-training or network fine-tuning. We find that the attention layers in DUSt3R inherently encode rich information about camera and object motion. By carefully disentangling these attention maps, we achieve accurate dynamic region segmentation, camera pose estimation, and 4D dense point map reconstruction. Extensive experiments on real-world dynamic videos demonstrate that our lightweight attention adaptation significantly outperforms previous state-of-the-art methods that are trained or finetuned on extensive dynamic datasets. Our code is publicly available for research purpose at https://easi3r.github.io/

Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen• 2025

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)55.9
193
Camera pose estimationSintel
ATE0.11
192
Camera pose estimationTUM-dynamic
ATE0.105
163
Video Depth EstimationKITTI
Abs Rel0.102
126
Camera pose estimationScanNet
RPE (t)0.017
119
Video Depth EstimationBONN
AbsRel5.9
116
Camera pose estimationTUM dynamics
ATE0.105
81
Video Object SegmentationDAVIS 2016
J-Measure54.93
50
Depth EstimationSintel ~50 frames
AbsRel0.377
47
Depth EstimationKITTI 110 frames
AbsRel10.2
46
Showing 10 of 25 rows

Other info

Follow for update