ZeroFlow: Scalable Scene Flow via Distillation
About
Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process full-size point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feedforward methods are considerably faster, running on the order of tens to hundreds of milliseconds for full-size point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feedforward model. Our instantiation of this framework, ZeroFlow, achieves state-of-the-art performance on the Argoverse 2 Self-Supervised Scene Flow Challenge while using zero human labels by simply training on large-scale, diverse unlabeled data. At test-time, ZeroFlow is over 1000x faster than label-free state-of-the-art optimization-based methods on full-size point clouds (34 FPS vs 0.028 FPS) and over 1000x cheaper to train on unlabeled data compared to the cost of human annotation (\$394 vs ~\$750,000). To facilitate further research, we release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets at https://vedder.io/zeroflow.html
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| LiDAR Scene Flow Estimation | Argoverse v2 (val) | EPE (m) - Dynamic Foreground0.131 | 23 | |
| LiDAR Scene Flow Estimation | Waymo Open Dataset 1.0 (val) | Dynamic Foreground EPE (m)0.2229 | 21 | |
| Scene Flow Estimation | Argoverse 2 Scene Flow Challenge 2024 (test) | Error Rate (BG)0.013 | 12 | |
| Scene Flow Estimation | Waymo Open | Threeway EPE0.092 | 10 | |
| Scene Flow Estimation | Waymo Open Dataset Longer Temporal Horizon (5 consecutive frames) | Dynamic Foreground EPE (m)0.7097 | 8 | |
| Scene Flow Estimation | Argoverse Static Foreground v2 (test) | EPE (m)0.0205 | 7 | |
| Scene Flow Estimation | Argoverse Static Background v2 (test) | EPE (m)0.0125 | 7 | |
| LiDAR Scene Flow Estimation | Argoverse Successive time steps v2 | EPE (Dynamic Foreground)0.2244 | 7 | |
| Scene Flow Estimation | Argoverse Dynamic Foreground v2 (test) | EPE (m)0.2244 | 7 | |
| Scene Flow Estimation | Argoverse 2 Sensor online leaderboard (test) | EPE 3-Way0.0569 | 6 |