TimeART: Towards Agentic Time Series Reasoning via Tool-Augmentation
About
Time series data widely exist in real-world cyber-physical systems. Though analyzing and interpreting them contributes to significant values, e.g, disaster prediction and financial risk control, current workflows mainly rely on human data scientists, which requires significant labor costs and lacks automation. To tackle this, we introduce TimeART, a framework fusing the analytical capability of strong out-of-the-box tools and the reasoning capability of Large Language Models (LLMs), which serves as a fully agentic data scientist for Time Series Question Answering (TSQA). To teach the LLM-based Time Series Reasoning Models (TSRMs) strategic tool-use, we also collect a 100k expert trajectory corpus called TimeToolBench. To enhance TSRMs' generalization capability, we then devise a four-stage training strategy, which boosts TSRMs through learning from their own early experiences and self-reflections. Experimentally, we train an 8B TSRM on TimeToolBench and equip it with the TimeART framework, and it achieves consistent state-of-the-art performance on multiple TSQA tasks, which pioneers a novel approach towards agentic time series reasoning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Time Series Question Answering | TimeMQA | Understanding Score59.33 | 10 | |
| Stock Price Forecasting | MTBench Stock Price Forecasting 7-day | MAE0.788 | 8 | |
| Stock Price Forecasting | MTBench Stock Price Forecasting (30-day) | MAE1.122 | 8 | |
| Temperature Forecasting | MTBench Temperature Forecasting 7-day | MSE4.021 | 8 | |
| Temperature Forecasting | MTBench Temperature Forecasting (14-day) | MSE5.026 | 8 | |
| Stock Indicator Forecasting | MTBench Stock Indicator Forecasting (7-day) | MACD0.347 | 8 | |
| Stock Indicator Forecasting | MTBench Stock Indicator Forecasting (30-day) | MACD Score0.862 | 8 |