SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition

About

Visual Place Recognition (VPR) is the task of matching current visual imagery from a camera to images stored in a reference map of the environment. While initial VPR systems used simple direct image methods or hand-crafted visual features, recent work has focused on learning more powerful visual features and further improving performance through either some form of sequential matcher / filter or a hierarchical matching process. In both cases the performance of the initial single-image based system is still far from perfect, putting significant pressure on the sequence matching or (in the case of hierarchical systems) pose refinement stages. In this paper we present a novel hybrid system that creates a high performance initial match hypothesis generator using short learnt sequential descriptors, which enable selective control sequential score aggregation using single image learnt descriptors. Sequential descriptors are generated using a temporal convolutional network dubbed SeqNet, encoding short image sequences using 1-D convolutions, which are then matched against the corresponding temporal descriptors from the reference dataset to provide an ordered list of place match hypotheses. We then perform selective sequential score aggregation using shortlisted single image learnt descriptors from a separate pipeline to produce an overall place match hypothesis. Comprehensive experiments on challenging benchmark datasets demonstrate the proposed method outperforming recent state-of-the-art methods using the same amount of sequential information. Source code and supplementary material can be found at https://github.com/oravus/seqNet.

Sourav Garg, Michael Milford• 2021

Related benchmarks

Task	Dataset	Result
Visual Place Recognition	Nordland	Recall@179.43	163
Visual Place Recognition	MSLS SF	Recall@155.6	22
Sequence-level Visual Place Recognition	Oxford1 pos=2m (test)	R@157.4	16
Sequence-level Visual Place Recognition	MSLS pos=25m (val)	Recall@171.1	16
Sequence-level Visual Place Recognition	NordLand pos=10f (test)	R@10.619	16
Sequence-level Visual Place Recognition	Oxford2 pos=2m (test)	R@116.5	16
Visual Place Recognition	MSLS Amman	Recall@127	13
Sequential Visual Place Recognition	nuScenes	Recall@18.27	11
Sequential Visual Place Recognition	Oxford (Hard)	Recall@130.64	11
Sequential Visual Place Recognition	Oxford Easy	Recall@153.63	11

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord