Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

About

We introduce a discriminative multimodal descriptor based on a pair of sensor readings: a point cloud from a LiDAR and an image from an RGB camera. Our descriptor, named MinkLoc++, can be used for place recognition, re-localization and loop closure purposes in robotics or autonomous vehicles applications. We use late fusion approach, where each modality is processed separately and fused in the final part of the processing pipeline. The proposed method achieves state-of-the-art performance on standard place recognition benchmarks. We also identify dominating modality problem when training a multimodal descriptor. The problem manifests itself when the network focuses on a modality with a larger overfit to the training data. This drives the loss down during the training but leads to suboptimal performance on the evaluation set. In this work we describe how to detect and mitigate such risk when using a deep metric learning approach to train a multimodal neural network. Our code is publicly available on the project website: https://github.com/jac99/MinkLocMultimodal.

Jacek Komorowski, Monika Wysoczanska, Tomasz Trzcinski• 2021

Related benchmarks

TaskDatasetResultRank
Place RecognitionOxford RobotCar
Avg Recall @ 1%99.1
43
Place RecognitionnuScenes (BS)
AR@176.72
18
Place RecognitionnuScenes (SON)
AR@189.39
17
Place RecognitionNCLT (Query: 2012-06-15, Database: 2012-01-08)
AR@142.77
16
Place RecognitionNCLT (Query: 2013-02-23, Database: 2012-01-08)
AR@10.3544
16
Place RecognitionSelf-collected dataset
AR@162.67
11
Place RecognitionnuScenes (Singapore-Queenstown (SQ) Split)
AR@189.94
9
Place RecognitionNCLT Query: 2013-04-05, Database: 2012-01-08
AR@136.38
9
Place RecognitionKITTI odometry
AR@1%82.1
6
Place RecognitionRobotCar Seasons
Recall (Dawn)93.6
5
Showing 10 of 10 rows

Other info

Code

Follow for update