Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

About

Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen Redko• 2025

Related benchmarks

TaskDatasetResultRank
ForecastingChronos zero-shot suite
MASE0.81
9
Multivariate clinical time-series classificationP12 (test)
AUROC81.89
3
Multivariate clinical time-series classificationP19 (test)
AUROC0.8846
3
Zero-shot ClassificationWOODS domain averages
CAP (EEG)78.2
3
Showing 4 of 4 rows

Other info

Follow for update