Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Chain-of-thought Reviewing and Correction for Time Series Question Answering

About

With the advancement of large language models (LLMs), diverse time series analysis tasks are reformulated as time series question answering (TSQA) through a unified natural language interface. However, existing LLM-based approaches largely adopt general natural language processing techniques and are prone to reasoning errors when handling complex numerical sequences. Different from purely textual tasks, time series data are inherently verifiable, enabling consistency checking between reasoning steps and the original input. Motivated by this property, we propose T3LLM, which performs multi-step reasoning with an explicit correction mechanism for time series question answering. The T3LLM framework consists of three LLMs, namely, a worker, a reviewer, and a student, that are responsible for generation, review, and reasoning learning, respectively. Within this framework, the worker generates step-wise chains of thought (CoT) under structured prompts, while the reviewer inspects the reasoning, identifies erroneous steps, and provides corrective comments. The collaboratively generated corrected CoT are used to fine-tune the student model, internalizing multi-step reasoning and self-correction into its parameters. Experiments on multiple real-world TSQA benchmarks demonstrate that T3LLM achieves state-of-the-art performance over strong LLM-based baselines.

Chen Su, Yuanhe Tian, Yan Song• 2025

Related benchmarks

TaskDatasetResultRank
Anomaly DetectionTMQA OPE
Accuracy53.2
8
Multiple-choice Question AnsweringCTQA
Accuracy66.5
8
Multiple-choice Question AnsweringTMQA
Accuracy63.5
8
Time Series ForecastingTMQA OPE
RMSE7.59e+3
8
Time Series ImputationTMQA OPE
RMSE1.32e+3
8
Time-series classificationTMQA OPE
Acc57.2
8
True/False Question AnsweringTMQA
Accuracy0.766
8
Showing 7 of 7 rows

Other info

Follow for update