Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

About

Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked modeling is promising as a domain-agnostic framework for self-supervised learning because it does not rely on input augmentations, its mask sampling procedure remains domain-specific. We present Self-guided Masked Autoencoders (SMA), a fully domain-agnostic masked modeling method. SMA trains an attention based model using a masked modeling objective, by learning masks to sample without any domain-specific assumptions. We evaluate SMA on three self-supervised learning benchmarks in protein biology, chemical property prediction, and particle physics. We find SMA is capable of learning representations without domain-specific knowledge and achieves state-of-the-art performance on these three benchmarks.

Johnathan Xie, Yoonho Lee, Annie S. Chen, Chelsea Finn• 2024

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)88.3
504
molecule property predictionMoleculeNet (scaffold split)
BBBP75
58
RegressionMoleculeNet (scaffold)
Lipo0.609
24
Binary ClassificationHIGGS small (test)
Accuracy (%)74.8
15
Year predictionYearPredictionMSD (test)
RMSE8.695
14
Particle Physics Process ClassificationHIGGS 1k (test)
Accuracy69.47
5
Particle Physics Process ClassificationHIGGS 10k (test)
Accuracy74.04
5
Particle Physics Process ClassificationHIGGS 100k (test)
Accuracy77.88
5
Protein property predictionTAPE (downstream)
Remote Homology0.23
4
Showing 9 of 9 rows

Other info

Follow for update