Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

About

Current feed-forward 3D/4D reconstruction systems rely on dense geometry and pose supervision -- expensive to obtain at scale and particularly scarce for dynamic real-world scenes. We present Flow3r, a framework that augments visual geometry learning with dense 2D correspondences (`flow') as supervision, enabling scalable training from unlabeled monocular videos. Our key insight is that the flow prediction module should be factored: predicting flow between two images using geometry latents from one and pose latents from the other. This factorization directly guides the learning of both scene geometry and camera motion, and naturally extends to dynamic scenes. In controlled experiments, we show that factored flow prediction outperforms alternative designs and that performance scales consistently with unlabeled data. Integrating factored flow into existing visual geometry architectures and training with ${\sim}800$K unlabeled videos, Flow3r achieves state-of-the-art results across eight benchmarks spanning static and dynamic scenes, with its largest gains on in-the-wild dynamic videos where labeled data is most scarce.

Zhongxiao Cong, Qitao Zhao, Minsik Jeon, Shubham Tulsiani• 2026

Related benchmarks

TaskDatasetResultRank
Dynamic Scene ReconstructionBONN
RPE Translation0.094
10
3D ReconstructionNRGBD
Accuracy0.8
9
3D ReconstructionSintel
ATE0.048
5
3D ReconstructionKinetics700
ATE0.013
5
3D Reconstruction and Pose EstimationCO3D v2
RRA@3098.84
5
Dynamic Scene ReconstructionKinetics 700
RPE Translation0.018
5
Dynamic Scene ReconstructionEpic-Kitchens
RPE (Translation)0.037
5
Dynamic Scene ReconstructionSintel
RPE Translation0.058
5
Static Scene ReconstructionCO3D v2
RTA@300.9762
5
Static Scene ReconstructionNRGBD
RTA@3099.6
5
Showing 10 of 14 rows

Other info

Follow for update