LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

About

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.

Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu• 2026

Related benchmarks

Task	Dataset	Result
Global Pattern Perception	MMTS-Bench	Accuracy97.5	15
Local Pattern Perception	BEDTime	Accuracy75.6	15
Series Comparison	MCQ2	Accuracy67	15
Min & Max Localization	HITSR-L1 real-world	Accuracy (%)86.8	12
Hierarchical Time Series Reasoning	HiTSR	--	11
Pattern Perception	LLATISA OOD	Local Accuracy75.6	9
Predictive Inference	L4 Out-of-Distribution (test)	Accuracy83.3	9
Semantic Reasoning	LLATISA OOD	Series Comparison Accuracy67	9
Numerical Read-out	LLATISA OOD	Min & Max Localization Accuracy86.8	7
ECG Grounding	HiTSR L3 (OOD)	Diagnosis Accuracy62.2	5

Showing 10 of 14 rows

Other info

GitHub

Follow for update

@wizwand_team Discord