ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
About
Human experts typically integrate numerical and textual multimodal information to analyze time series. However, most traditional deep learning predictors rely solely on unimodal numerical data, using a fixed-length window for training and prediction on a single dataset, and cannot adapt to different scenarios. The powered pre-trained large language model has introduced new opportunities for time series analysis. Yet, existing methods are either inefficient in training, incapable of handling textual information, or lack zero-shot forecasting capability. In this paper, we innovatively model time series as a foreign language and construct ChatTime, a unified framework for time series and text processing. As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability and supports bimodal input/output for both time series and text. We design a series of experiments to verify the superior performance of ChatTime across multiple tasks and scenarios, and create four multimodal datasets to address data gaps. The experimental results demonstrate the potential and utility of ChatTime.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Bitcoin Price Prediction | Bitcoin | MSE3.9389 | 57 | |
| Context-guided time series forecasting | PTF | MAE0.348 | 45 | |
| Time Series Forecasting | TimeMMD Agriculture | MSE0.193 | 40 | |
| Forecasting | Time-MMD Overall Average | Average Error1.213 | 21 | |
| Contextual forecasting | Context Is Key | SMAPE70.1 | 20 | |
| Time Series Reasoning | TSUR Reasoning (test) | Inductive Accuracy49.18 | 19 | |
| Time Series Forecasting | TimeMMD Energy | MSE0.111 | 18 | |
| Time Series Forecasting | PTF CGTSF (test) | CRPS0.1478 | 16 | |
| Time Series Forecasting | LEU CGTSF (test) | CRPS0.464 | 16 | |
| Time Series Forecasting | MSPG CGTSF (test) | CRPS0.9655 | 16 |