Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MOSAIC: Module Discovery via Sparse Additive Identifiable Causal Learning for Scientific Time Series

About

Causal representation learning (CRL) seeks to recover latent variables with identifiability guarantees, typically up to permutation and component-wise reparameterization under appropriate assumptions. However, identifiability does not imply interpretability: latent semantics are typically assigned post hoc by alignment with known ground-truth factors. This limitation is particularly acute in scientific time series, where underlying mechanisms are unknown and discovering interpretable structure is a primary goal. In contrast, scientific observations (such as residue-pair distances, climate indices, or process sensors) are inherently semantic, as they correspond to named physical quantities. This raises a key question: can the interpretability of observations be transferred to the identifiable latent space? We propose MOSAIC (Module discovery via Sparse Additive Identifiable Causal learning), a sparse temporal VAE that integrates temporal CRL identifiability with support recovery over observed variables. MOSAIC identifies latent variables via regime-conditioned temporal variation, and recovers for each latent a sparse set of associated observations through an additive decoder, yielding module-level interpretability. We show that ANOVA main-effect supports are identifiable under general smooth mixing functions, and provide finite-sample recovery guarantees for a tractable sparse-additive variant. Empirically, MOSAIC recovers domain-consistent variable groups across RNA molecular dynamics, solar wind, ENSO climate, the Tennessee Eastman process, and a synthetic tokamak benchmark, enabling interpretable discovery of latent mechanisms in scientific time series.

Shicheng Fan, Nour Elhendawy, Jianle Sun, Ke Fang, Kun Zhang, Yihang Wang, Lu Cheng• 2026

Related benchmarks

TaskDatasetResultRank
Regime-associated latent factor identificationRna
Regime Accuracy91.2
11
Latent Factor IdentificationPhysics-Inspired Synthetic Energy-Landscape Monotonic Nonlinear Mixing
MCC0.912
10
Regime classification and latent localizationcUUCGg tetraloop RNA molecular dynamics (MD) simulations
Regime Accuracy91.2
4
Regime identificationOMNI--
4
Regime identificationDisruption--
4
Regime identificationClimate--
4
Cross-domain LocalizationOMNI
Regime Accuracy94
1
Cross-domain LocalizationDisruption
Regime Accuracy91.1
1
Cross-domain LocalizationClimate (ENSO)
Regime Accuracy88.8
1
Cross-domain LocalizationTEP
Regime Accuracy72.9
1
Showing 10 of 13 rows

Other info

Follow for update