CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

About

Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pre-training on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pre-training of TSFMs, we propose \textsc{CauKer}, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. \textsc{CauKer} combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pre-training of state-of-the-art classification TSFMs having different architectures and following different pre-training approaches. Additionally, our experiments reveal that \textsc{CauKer}-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior. The source code is publicly available at https://github.com/ShifengXIE/CauKer.

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen Redko• 2025

Related benchmarks

Task	Dataset	Result
Forecasting	Chronos zero-shot suite	MASE0.81	9
Multivariate clinical time-series classification	P12 (test)	AUROC81.89	3
Multivariate clinical time-series classification	P19 (test)	AUROC0.8846	3
Zero-shot Classification	WOODS domain averages	CAP (EEG)78.2	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord