Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning

About

Recent time-series foundation models exhibit strong abilities to predict physical systems. These abilities include zero-shot forecasting, in which a model forecasts future states of a system given only a short trajectory as context, without knowledge of the underlying physics. Here, we show that foundation models often forecast through a simple parroting strategy, and when they are not parroting they exhibit some shared failure modes such as converging to the mean. As a result, a naive context parroting model that copies directly from the context scores higher than leading time-series foundation models on predicting a diverse range of dynamical systems, including low-dimensional chaos, turbulence, coupled oscillators, and electrocardiograms, at a tiny fraction of the computational cost. We draw a parallel between context parroting and induction heads, which explains recent works showing that large language models can often be repurposed for time series forecasting. Our dynamical systems perspective also ties the scaling between forecast accuracy and context length to the fractal dimension of the underlying chaotic attractor, providing insight into previously observed in-context neural scaling laws. By revealing the performance gaps and failure modes of current time-series foundation models, context parroting can guide the design of future foundation models and help identify in-context learning strategies beyond parroting.

Yuanzhao Zhang, William Gilpin• 2025

Related benchmarks

TaskDatasetResultRank
ForecastingECG
KL Divergence0.065
7
ForecastingKuramoto
KL Divergence0.001
7
ForecastingECG (QT Database in PhysioNet)
MAE0.624
7
ForecastingCircuit
MAE0.083
7
ForecastingKuramoto
MAE (Kuramoto)0.004
7
ForecastingCircuit SciML (test)
MSE0.012
7
ForecastingKuramoto SciML (test)
MSE0.001
7
ForecastingTurbulence
KL Divergence0.028
7
ForecastingCircuit
KL Divergence0.572
7
ForecastingTurbulence (von Karman vortex street) Re = 900
MAE0.403
7
Showing 10 of 13 rows

Other info

Follow for update