Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

About

We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated for data-efficient transformer training. Codes are available at https://github.com/atik666/ssmae.

Atik Faysal, Mohammad Rostami, Reihaneh Gh. Roshan, Nikhil Muralidhar, Huaxia Wang• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationCIFAR-100 10% labeled data
Accuracy22.65
46
Image ClassificationCIFAR-10 10% labeled data (test)
All Accuracy56.8
11
ClassificationCIFAR-100 20% labeled data
Accuracy32.41
4
ClassificationCIFAR-100 30% labeled data
Accuracy35.31
4
ClassificationCIFAR-100 40% labeled data
Accuracy41.27
4
Image ClassificationCIFAR-10 (20% labeled)
Accuracy0.664
4
Image ClassificationCIFAR-10 30% labeled data
Accuracy71.83
4
Image ClassificationCIFAR-10 40% labeled
Accuracy74.67
4
Showing 8 of 8 rows

Other info

Follow for update