Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

About

Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a Transformer encoder equivariant to channel order. CHARM is trained with a Joint Embedding Predictive Architecture (JEPA) and a novel loss promoting informative, temporally stable embeddings; latent-space prediction encourages robustness to sensor noise while description-aware gating provides interpretability through learned inter-channel relationships. Across anomaly detection, classification, and short- and long-term forecasting, the learned embeddings achieve strong performance using only a linear probe. Performance is driven primarily by the JEPA objective and conditioning architecture, with text descriptions serving as channel identifiers for cross-dataset generalization.

Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad, Henrik Ohlsson• 2026

Related benchmarks

TaskDatasetResultRank
Time Series ForecastingETTm1
MSE0.411
363
Anomaly DetectionUCR
F1 Score75.4
28
Multivariate Time Series ClassificationUEA
Average Accuracy80.9
18
ForecastingExchange Rate
MSE0.092
16
Time Series ForecastingWeather
MSE0.222
14
Multivariate Anomaly DetectionSKAB
F1 Score86
6
ClassificationUCI Hydraulic Systems (unseen)
Valve Condition Accuracy (4-class)99.8
2
Showing 7 of 7 rows

Other info

Follow for update