Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

About

Time series is a pervasive data type across various application domains, rendering the reasonable solving of diverse time series tasks a long-standing goal. Recent advances in large language models (LLMs), especially their reasoning abilities unlocked through reinforcement learning (RL), have opened new opportunities for tackling tasks with long Chain-of-Thought (CoT) reasoning. However, leveraging LLM reasoning for time series remains in its infancy, hindered by the absence of carefully curated time series CoT data for training, limited data efficiency caused by underexplored data scheduling, and the lack of RL algorithms tailored for exploiting such time series CoT data. In this paper, we introduce VeriTime, a framework that tailors LLMs for time series reasoning through data synthesis, data scheduling, and RL training. First, we propose a data synthesis pipeline that constructs a TS-text multimodal dataset with process-verifiable annotations. Second, we design a data scheduling mechanism that arranges training samples according to a principled hierarchy of difficulty and task taxonomy. Third, we develop a two-stage reinforcement finetuning featuring fine-grained, multi-objective rewards that leverage verifiable process-level CoT data. Extensive experiments show that VeriTime substantially boosts LLM performance across diverse time series reasoning tasks. Notably, it enables compact 3B, 4B models to achieve reasoning capabilities on par with or exceeding those of larger proprietary LLMs.

Jiahui Zhou, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Lin Li, Zhuomin Chen, Jian Lou, See-Kiong Ng• 2026

Related benchmarks

TaskDatasetResultRank
Knowledge-based reasoningCTU
Accuracy67.5
19
Knowledge-based reasoningECG
Accuracy (%)30.3
19
Knowledge-based reasoningRCW
Accuracy64.89
19
Knowledge-based reasoningEMG
Accuracy65.31
19
Inferential CalculationTSRBench
Accuracy77.14
15
Scenario AttributionTSRBench
Accuracy87.5
15
Scenario-based Reasoning (Overall)TSRBench
Overall Accuracy86.55
15
Anomaly DetectionTSRBench
Accuracy91.11
15
General Numerical ReasoningDROP
Accuracy82.55
4
Synthetic Time Series ReasoningTimeSeriesExam
Accuracy47.27
4
Showing 10 of 10 rows

Other info

Follow for update