Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization

About

Large language models (LLMs), with demonstrated reasoning abilities across multiple domains, are largely underexplored for time-series reasoning (TsR), which is ubiquitous in the real world. In this work, we propose TimerBed, the first comprehensive testbed for evaluating LLMs' TsR performance. Specifically, TimerBed includes stratified reasoning patterns with real-world tasks, comprehensive combinations of LLMs and reasoning strategies, and various supervised models as comparison anchors. We perform extensive experiments with TimerBed, test multiple current beliefs, and verify the initial failures of LLMs in TsR, evidenced by the ineffectiveness of zero shot (ZST) and performance degradation of few shot in-context learning (ICL). Further, we identify one possible root cause: the numerical modeling of data. To address this, we propose a prompt-based solution VL-Time, using visualization-modeled data and language-guided reasoning. Experimental results demonstrate that Vl-Time enables multimodal LLMs to be non-trivial ZST and powerful ICL reasoners for time series, achieving about 140% average performance improvement and 99% average token costs reduction.

Haoxin Liu, Chenghao Liu, B. Aditya Prakash• 2024

Related benchmarks

TaskDatasetResultRank
Time Series ReasoningECG-QA
Accuracy64.36
22
Time Series ReasoningTRQA
Accuracy58.5
22
Time Series ReasoningRCW
Accuracy57.52
22
Time Series ReasoningSLEEP QA
Acc0.2353
22
Time Series ReasoningTSQA
Accuracy44.93
22
Time Series ReasoningETI
Accuracy28.5
22
ClassificationTimerBed
Accuracy33.2
9
RegressionTSQA
MAE61.294
9
Question AnsweringMTBench Weather
Accuracy61.3
9
ClassificationTSQA
Accuracy49.8
9
Showing 10 of 16 rows

Other info

Follow for update