CORAL: Colored structural representation for bi-modal place recognition
About
Place recognition is indispensable for a drift-free localization system. Due to the variations of the environment, place recognition using single-modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR. Specifically, we first build the elevation image generated from 3D points as a structural representation. Then, we derive the correspondences between 3D points and image pixels that are further used in merging the pixel-wise visual features into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic representation, namely CORAL. And the whole network is called CORAL-VLAD. Comparisons on the Oxford RobotCar show that CORAL-VLAD has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations on cross-city datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Place Recognition | Oxford RobotCar | Avg Recall @ 1%96.1 | 43 | |
| Place Recognition | Oxford RobotCar (test) | Avg Recall @1%96.13 | 27 | |
| Place Recognition | KITTI odometry | AR@1%76.4 | 6 | |
| Place Recognition | KITTI laser generalization (test) | Recall@176.43 | 4 | |
| Place Recognition | KITTI stereo generalization (test) | Recall@170.77 | 4 | |
| Place Recognition | YQ generalization (test) | Recall@173.82 | 4 |