MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis

About

Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. Since applying Transformer architectures to new modalities is usually non-trivial, we first adapt Vision Transformer to 3D mesh data processing, i.e., Mesh Transformer. In specific, we divide a mesh into several non-overlapping local patches with each containing the same number of faces and use the 3D position of each patch's center point to form positional embeddings. Inspired by MAE, we explore how pre-training on 3D mesh data with the Transformer-based structure benefits downstream 3D mesh analysis tasks. We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers. Then, through reconstructing the information of masked patches, the network is capable of learning discriminative representations for mesh data. Therefore, we name our method MeshMAE, which can yield state-of-the-art or comparable performance on mesh analysis tasks, i.e., classification and segmentation. In addition, we also conduct comprehensive ablation studies to show the effectiveness of key designs in our method.

Yaqian Liang, Shanshan Zhao, Baosheng Yu, Jing Zhang, Fazhi He• 2022

Related benchmarks

Task	Dataset	Result
3D Object Classification	ModelNet40	--	89
3D Classification	ScanObjectNN OBJ-BG official	Accuracy90	37
3D Object Classification	ScanObjectNN PB	Accuracy85.2	29
3D Object Classification	ScanObjectNN ONLY	Accuracy88.3	23
Dental Abutment Design	Oral Scan Dataset	Transgingival Distance28.61	17
Diameter Regression	Oral scan data	IoU65.16	8
Height Regression	Oral scan data	IoU31.52	8
Transgingival Regression	Oral scan data	mIoU42.84	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord