Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
About
Featurizing microscopy images for use in biological research remains a significant challenge, especially for large-scale experiments spanning millions of images. This work explores the scaling properties of weakly supervised classifiers and self-supervised masked autoencoders (MAEs) when training with increasingly larger model backbones and microscopy datasets. Our results show that ViT-based MAEs outperform weakly supervised classifiers on a variety of tasks, achieving as much as a 11.5% relative improvement when recalling known biological relationships curated from public databases. Additionally, we develop a new channel-agnostic MAE architecture (CA-MAE) that allows for inputting images of different numbers and orders of channels at inference time. We demonstrate that CA-MAEs effectively generalize by inferring and evaluating on a microscopy image dataset (JUMP-CP) generated under different experimental conditions with a different channel structure than our pretraining data (RPI-93M). Our findings motivate continued research into scaling self-supervised learning on microscopy data in order to create powerful foundation models of cellular biology that have the potential to catalyze advancements in drug discovery and beyond.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| MoA classification | JUMP-CP Source S8 (test) | Accuracy84 | 17 | |
| MoA classification | JUMP-CP Source S3 (test) | Accuracy74.7 | 17 | |
| MoA prediction | JUMP-CP Source 3 (new experimental batch) | Accuracy70.9 | 13 | |
| 3D Volumetric Microscopy Image Reconstruction | FVCD (test) | PSNR (dB)34.01 | 4 | |
| MoA prediction | JUMP-CP Source 3 (in-domain) | Accuracy74.7 | 4 |