Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos

About

Learning to understand dynamic 3D scenes from imagery is crucial for applications ranging from robotics to scene reconstruction. Yet, unlike other problems where large-scale supervised training has enabled rapid progress, directly supervising methods for recovering 3D motion remains challenging due to the fundamental difficulty of obtaining ground truth annotations. We present a system for mining high-quality 4D reconstructions from internet stereoscopic, wide-angle videos. Our system fuses and filters the outputs of camera pose estimation, stereo depth estimation, and temporal tracking methods into high-quality dynamic 3D reconstructions. We use this method to generate large-scale data in the form of world-consistent, pseudo-metric 3D point clouds with long-term motion trajectories. We demonstrate the utility of this data by training a variant of DUSt3R to predict structure and 3D motion from real-world image pairs, showing that training on our reconstructed data enables generalization to diverse real-world scenes. Project page and data at: https://stereo4d.github.io

Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, Aleksander Holynski• 2024

Related benchmarks

TaskDatasetResultRank
Geometry EstimationKITTI
Abs. Rel. (Absolute Relative Error)11.3
13
Geometry EstimationStereo4D (test)
EPE0.683
12
Geometry EstimationBONN
EPE0.423
12
Geometry EstimationSintel Final
EPE (Error Pixel Displacement)4.296
12
Motion estimation (forward)Stereo4D (test)
3D Endpoint Error (EPE3D)0.118
8
Geometry EstimationKITTI scene flow 2015 (train)
EPE4.889
6
Scene FlowKITTI scene flow 2015 (train)--
5
Forward Motion EstimationKITTI scene flow 2015 (train)
3D Endpoint Error (EPE3D)0.463
4
Motion estimation (backward)Stereo4D (test)
EPE3D0.106
4
Showing 9 of 9 rows

Other info

Follow for update