Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

About

Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. Since applying Transformer architectures to new modalities is usually non-trivial, we first adapt Vision Transformer to 3D mesh data processing, i.e., Mesh Transformer. In specific, we divide a mesh into several non-overlapping local patches with each containing the same number of faces and use the 3D position of each patch's center point to form positional embeddings. Inspired by MAE, we explore how pre-training on 3D mesh data with the Transformer-based structure benefits downstream 3D mesh analysis tasks. We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers. Then, through reconstructing the information of masked patches, the network is capable of learning discriminative representations for mesh data. Therefore, we name our method MeshMAE, which can yield state-of-the-art or comparable performance on mesh analysis tasks, i.e., classification and segmentation. In addition, we also conduct comprehensive ablation studies to show the effectiveness of key designs in our method.

Yaqian Liang, Shanshan Zhao, Baosheng Yu, Jing Zhang, Fazhi He• 2022

Related benchmarks

TaskDatasetResultRank
3D Object ClassificationModelNet40--
89
3D ClassificationScanObjectNN OBJ-BG official
Accuracy90
37
3D Object ClassificationScanObjectNN PB
Accuracy85.2
29
3D Object ClassificationScanObjectNN ONLY
Accuracy88.3
23
Dental Abutment DesignOral Scan Dataset
Transgingival Distance28.61
17
Diameter RegressionOral scan data
IoU65.16
8
Height RegressionOral scan data
IoU31.52
8
Transgingival RegressionOral scan data
mIoU42.84
8
Showing 8 of 8 rows

Other info

Follow for update