Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

About

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.

Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu• 2026

Related benchmarks

TaskDatasetResultRank
Global Pattern PerceptionMMTS-Bench
Accuracy97.5
15
Local Pattern PerceptionBEDTime
Accuracy75.6
15
Series ComparisonMCQ2
Accuracy67
15
Min & Max LocalizationHITSR-L1 real-world
Accuracy (%)86.8
12
Hierarchical Time Series ReasoningHiTSR--
11
Pattern PerceptionLLATISA OOD
Local Accuracy75.6
9
Predictive InferenceL4 Out-of-Distribution (test)
Accuracy83.3
9
Semantic ReasoningLLATISA OOD
Series Comparison Accuracy67
9
Numerical Read-outLLATISA OOD
Min & Max Localization Accuracy86.8
7
ECG GroundingHiTSR L3 (OOD)
Diagnosis Accuracy62.2
5
Showing 10 of 14 rows

Other info

GitHub

Follow for update