RAFT-3D: Scene Flow using Rigid-Motion Embeddings
About
We address the problem of scene flow: given a pair of stereo or RGB-D video frames, estimate pixelwise 3D motion. We introduce RAFT-3D, a new deep architecture for scene flow. RAFT-3D is based on the RAFT model developed for optical flow but iteratively updates a dense field of pixelwise SE3 motion instead of 2D motion. A key innovation of RAFT-3D is rigid-motion embeddings, which represent a soft grouping of pixels into rigid objects. Integral to rigid-motion embeddings is Dense-SE3, a differentiable layer that enforces geometric consistency of the embeddings. Experiments show that RAFT-3D achieves state-of-the-art performance. On FlyingThings3D, under the two-view evaluation, we improved the best published accuracy (d < 0.05) from 34.3% to 83.7%. On KITTI, we achieve an error of 5.77, outperforming the best published method (6.31), despite using no object instance supervision. Code is available at https://github.com/princeton-vl/RAFT-3D.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Optical Flow | KITTI 2015 (test) | Fl Error (All)4.29 | 95 | |
| Disparity Estimation | KITTI 2015 (test) | D1 Error (bg, all)1.48 | 77 | |
| Optical Flow | MPI Sintel (train) | EPE (Final)2.91 | 63 | |
| Scene Flow Estimation | FlyingThings3D with occlusions (F3Do) (test) | EPE3D0.064 | 28 | |
| Scene Flow | KITTI Scene Flow 2015 (test) | D1 Score (All)1.81 | 28 | |
| Optical Flow | FlyingThings3D (val) | EPE2D2.37 | 15 | |
| Scene Flow | FlyingThings3D (val) | EPE3D0.062 | 14 | |
| Scene Flow | KITTI Scene Flow (test) | D1 Error (noc)1.63 | 12 | |
| Scene Flow | Event-KITTI Night | EPE0.104 | 10 | |
| Scene Flow Estimation | FlyingThings3D F3Dc all Clean (test) | EPE3D0.094 | 6 |