$R^3$: 3D Reconstruction via Relative Regression
About
Recent feed-forward geometry foundation models have demonstrated impressive generalization by recovering depth and poses in a single forward pass. However, these models are typically constrained by a global coordinate frame assumption. This dependency becomes a significant bottleneck for long-context and streaming reconstruction, as it forces the network to maintain an arbitrary temporal origin and handle translation magnitudes that grow unbounded over time. Our solution, which we call $R^3$, employs relative regression. We employ a lightweight MLP to predict confidence-weighted relative constraints. These confidences serve as a unified anchor: weighting losses during training and guiding pose aggregation during inference. $R^3$ supports both full-context offline reconstruction and causal, bounded-memory streaming. Our evaluation in both offline and streaming settings validates the effectiveness of our relative mechanism. Project page: https://kevinxu02.github.io/r3-site
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Pose Estimation | ETH3D | -- | 49 | |
| Camera pose estimation | Sintel ~50 frames | ATE0.115 | 41 | |
| Camera pose estimation | TUM-dynamics 90 frames | ATE0.012 | 24 | |
| Camera pose estimation | Scannet 90 frames | ATE0.037 | 24 | |
| Point Map Estimation | 7-Scenes sparse view | Mean Accuracy9.2 | 17 | |
| Point Map Estimation | NRGBD sparse view | Accuracy (Mean)4.7 | 17 | |
| 3D Reconstruction | 7-Scenes Length 200 | Accuracy (Mean)0.021 | 10 | |
| 3D Reconstruction | 7-Scenes length 1000 | Accuracy (Mean)2.2 | 9 | |
| 3D Reconstruction | 7-Scenes length 500 | Accuracy2.2 | 6 | |
| Pose Estimation | RobustNeRF | ATE0.152 | 4 |