Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

About

Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it learns representations indirectly by reconstructing masked input patches. Several methods learn representations directly by predicting representations of masked patches; however, we think using all patches to encode training signal representations is suboptimal. We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches. In the M2D, the online network encodes visible patches and predicts masked patch representations, and the target network, a momentum encoder, encodes masked patches. To better predict target representations, the online network should model the input well, while the target network should also model it well to agree with online predictions. Then the learned representations should better model the input. We validated the M2D by learning general-purpose audio representations, and M2D set new state-of-the-art performance on tasks such as UrbanSound8K, VoxCeleb1, AudioSet20K, GTZAN, and SpeechCommandsV2. We additionally validate the effectiveness of M2D for images using ImageNet-1K in the appendix.

Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino• 2022

Related benchmarks

TaskDatasetResultRank
Audio ClassificationESC-50
Accuracy95
325
Audio ClassificationAudioSet 20K
mAP38.6
128
Audio ClassificationUrbansound8K
Accuracy87.6
116
Audio ClassificationAudioSet 2M
mAP47.9
79
Musical Instrument ClassificationNSynth
Accuracy76.9
75
Image ClassificationImageNet-1k (val)
Top-1 Accuracy83.35
65
Audio ClassificationSPC V2
Accuracy95.4
65
Keyword SpottingSpeech Commands V2
Accuracy98.5
61
Environmental Sound ClassificationFSD50K
mAP52.8
60
Speaker IdentificationVoxCeleb1
Accuracy96.5
58
Showing 10 of 23 rows

Other info

Code

Follow for update