Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PatchFormer: An Efficient Point Transformer with Patch Attention

About

The point cloud learning community witnesses a modeling shift from CNNs to Transformers, where pure Transformer architectures have achieved top accuracy on the major learning benchmarks. However, existing point Transformers are computationally expensive since they need to generate a large attention map, which has quadratic complexity (both in space and time) with respect to input size. To solve this shortcoming, we introduce Patch ATtention (PAT) to adaptively learn a much smaller set of bases upon which the attention maps are computed. By a weighted summation upon these bases, PAT not only captures the global shape context but also achieves linear complexity to input size. In addition, we propose a lightweight Multi-Scale aTtention (MST) block to build attentions among features of different scales, providing the model with multi-scale features. Equipped with the PAT and MST, we construct our neural architecture called PatchFormer that integrates both modules into a joint framework for point cloud learning. Extensive experiments demonstrate that our network achieves comparable accuracy on general point cloud learning tasks with 9.2x speed-up than previous point Transformers.

Zhang Cheng, Haocheng Wan, Xinyi Shen, Zizhao Wu• 2021

Related benchmarks

TaskDatasetResultRank
Semantic segmentationS3DIS (Area 5)
mIOU68.1
799
Part SegmentationShapeNetPart (test)--
312
3D Shape ClassificationModelNet40 (test)
Accuracy93.5
227
Part SegmentationShapeNetPart--
198
Shape classificationModelNet40
Accuracy93.5
85
Semantic segmentationS3DIS (test)
mIoU68.1
47
Object ClassificationModelNet40
Instance Accuracy93.5
33
Showing 7 of 7 rows

Other info

Follow for update