Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder
About
Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Perturbation response modeling | Srivatsan20 | Cosine logFC0.53 | 20 | |
| Perturbation response modelling | Norman19 | Cosine logFC0.78 | 20 | |
| Perturbation response modelling | Jiang24 | Cosine logFC0.59 | 19 | |
| Combo prediction | Norman 19 | MMD GEX4.1 | 14 | |
| Covariate transfer task | Srivatsan20 (test) | MMD GEX2.5 | 14 | |
| Unseen Cell Prediction | Srivatsan-Sciplex3 2020 (OOD) | CosLogFC0.07 | 4 | |
| Unseen Perturbation Prediction | Norman 2019 (OOD) | CosLogFC0.58 | 4 | |
| Unseen Perturbation Prediction | Adamson 2016 (OOD) | CosLogFC0.51 | 4 | |
| Unseen Perturbation Prediction | Srivatsan-Sciplex2 2020 (OOD) | CosLogFC0.12 | 4 |