Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
About
This work proposes a novel approach to 4D radar-based scene flow estimation via cross-modal learning. Our approach is motivated by the co-located sensing redundancy in modern autonomous vehicles. Such redundancy implicitly provides various forms of supervision cues to the radar scene flow estimation. Specifically, we introduce a multi-task model architecture for the identified cross-modal learning problem and propose loss functions to opportunistically engage scene flow estimation using multiple cross-modal constraints for effective model training. Extensive experiments show the state-of-the-art performance of our method and demonstrate the effectiveness of cross-modal supervised learning to infer more accurate 4D radar scene flow. We also show its usefulness to two subtasks - motion segmentation and ego-motion estimation. Our source code will be available on https://github.com/Toytiny/CMFlow.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Odometry | View-of-Delft (VoD) sequence 24 | t_rel0.12 | 14 | |
| Odometry | View-of-Delft (VoD) sequence 17 | t_rel (Translation Error)0.06 | 14 | |
| Odometry | View-of-Delft (VoD) sequence 19 | t_rel (Translation Error)0.28 | 14 | |
| Odometry | View-of-Delft (VoD) sequence 09 | t_rel (Translation Error)0.09 | 14 | |
| Odometry | View-of-Delft (VoD) sequence 22 | t_rel Error0.14 | 14 | |
| Odometry | View-of-Delft (VoD) Mean | t_rel (Translation Error)0.11 | 14 | |
| Odometry | View-of-Delft (VoD) sequence 04 | Rel. Translation Error (t_rel)5 | 14 | |
| Odometry | View-of-Delft (VoD) sequence 03 | Rel. Translation Error (t_rel)0.06 | 12 | |
| Scene Flow Estimation | VoD (View-of-Delft) (test) | EPE (m)0.13 | 9 | |
| Scene Flow Estimation | VoD Radar evaluation (val) | 3-way EPE0.118 | 3 |