TS-Agent: Understanding and Reasoning Over Raw Time Series via Iterative Insight Gathering
About
Large language models (LLMs) exhibit strong symbolic and compositional reasoning, yet they struggle with time series question answering as the data is typically transformed into an LLM-compatible modality, e.g., serialized text, plotted images, or compressed time series embeddings. Such conversions impose representation bottlenecks, often require cross-modal alignment or finetuning, and can exacerbate hallucination and knowledge leakage. To address these limitations, we propose TS-Agent, an agentic, tool-grounded framework that uses LLMs strictly for iterative evidence-based reasoning, while delegating statistical and structural extraction to time series analytical tools operating on raw sequences. Our framework solves time series tasks through an evidence-driven agentic process: (1) it alternates between thinking, tool execution, and observation in a ReAct-style loop, (2) records intermediate results in an explicit evidence log and corrects the reasoning trace via a self-refinement critic, and (3) enforces a final answer-verification step to prevent hallucinations and leakage. Across four benchmarks spanning time series understanding and reasoning, TS-Agent matches or exceeds strong text-based, vision-based, and time-series language model baselines, with the largest gains on reasoning tasks where multimodal LLMs are prone to hallucination and knowledge leakage in zero-shot settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Time Series Reasoning | TSandLang Two TS | Accuracy57.3 | 12 | |
| Time Series Reasoning | TSandLang One TS | Accuracy80.5 | 12 | |
| Time Series Understanding | TSExam | Pattern Recognition71 | 10 | |
| Time Series Reasoning | MMTS-Bench | Average Score60 | 9 | |
| time series feature understanding | time series feature understanding benchmark | Trend Score98 | 8 |