Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

About

Point cloud-based large scale place recognition is fundamental for many applications like Simultaneous Localization and Mapping (SLAM). Although many models have been proposed and have achieved good performance by learning short-range local features, long-range contextual properties have often been neglected. Moreover, the model size has also become a bottleneck for their wide applications. To overcome these challenges, we propose a super light-weight network model termed SVT-Net for large scale place recognition. Specifically, on top of the highly efficient 3D Sparse Convolution (SP-Conv), an Atom-based Sparse Voxel Transformer (ASVT) and a Cluster-based Sparse Voxel Transformer (CSVT) are proposed to learn both short-range local features and long-range contextual features in this model. Consisting of ASVT and CSVT, SVT-Net can achieve state-of-the-art on benchmark datasets in terms of both accuracy and speed with a super-light model size (0.9M). Meanwhile, two simplified versions of SVT-Net are introduced, which also achieve state-of-the-art and further reduce the model size to 0.8M and 0.4M respectively.

Zhaoxin Fan, Zhenbo Song, Hongyan Liu, Zhiwu Lu, Jun He, Xiaoyong Du• 2021

Related benchmarks

TaskDatasetResultRank
Place RecognitionOxford RobotCar
Avg Recall @ 1%98
43
Place RecognitionOxford
AR@1%98.4
42
Place RecognitionR.A.
AR@1 (%)99.5
40
Place RecognitionB.D.
AR@1%97.2
40
Place RecognitionUniversity Sectors (U.S.)
Recall@1%99.9
30
Place RecognitionU.S.
AR@1%99.9
20
Place RecognitionResidential Area (R.A.)
Avg Recall @ 1%92.7
10
Place RecognitionBusiness District (B.D.)
Recall@1%90.7
10
Place RecognitionOxford (test)
Recall@1%98.6
10
Place RecognitionU.S. University Sector (test)
Avg Recall @ 1%99.9
10
Showing 10 of 13 rows

Other info

Follow for update