Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning

About

Time-series reasoning remains a significant challenge in multimodal large language models (MLLMs) due to the dynamic temporal patterns, ambiguous semantics, and lack of temporal priors. In this work, we introduce TimeMaster, a reinforcement learning (RL)-based method that enables time-series MLLMs to perform structured, interpretable reasoning directly over visualized time-series inputs and task prompts. TimeMaster adopts a three-part structured output format, reasoning, classification, and domain-specific extension, and is optimized via a composite reward function that aligns format adherence, prediction accuracy, and open-ended insight quality. The model is trained using a two-stage pipeline: we first apply supervised fine-tuning (SFT) to establish a good initialization, followed by Group Relative Policy Optimization (GRPO) at the token level to enable stable and targeted reward-driven improvement in time-series reasoning. We evaluate TimeMaster on the TimerBed benchmark across six real-world classification tasks based on Qwen2.5-VL-3B-Instruct. TimeMaster achieves state-of-the-art performance, outperforming both classical time-series models and few-shot GPT-4o by over 14.6% and 7.3% performance gain, respectively. Notably, TimeMaster goes beyond time-series classification: it also exhibits expert-like reasoning behavior, generates context-aware explanations, and delivers domain-aligned insights. Our results highlight that reward-driven RL can be a scalable and promising path toward integrating temporal understanding into time-series MLLMs.

Junru Zhang, Lang Feng, Xu Guo, Yuhan Wu, Yabo Dong, Duanqing Xu• 2025

Related benchmarks

TaskDatasetResultRank
Time Series ReasoningSLEEP QA
Acc0.7255
22
Time Series ReasoningRCW
Accuracy76.99
22
Time Series ReasoningTSQA
Accuracy61.22
22
Time Series ReasoningECG-QA
Accuracy69.31
22
Time Series ReasoningTRQA
Accuracy72.08
22
Time Series ReasoningETI
Accuracy49
22
Anomaly Location DetectionAnomLLM (test)
Frequency Precision (P)57.3
14
Anomaly ClassificationAnomLLM (test)
Accuracy57.9
13
Showing 8 of 8 rows

Other info

Follow for update