Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning

About

Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs' performance on temporal QA benchmarks by significant margins. Our code and data are released at: https://github.com/nusnlp/complex-tr.

Qingyu Tan, Hwee Tou Ng, Lidong Bing• 2023

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringComplex-TR ODQA 1.0 (test)
Set Accuracy0.312
13
Single-hop Question AnsweringComplex-TR ODQA 1.0 (test)
Set Accuracy49
13
Temporal Question AnsweringReasonQA Single-hop
Set Accuracy95.1
7
Temporal Question AnsweringReasonQA Multi-hop
Set Accuracy85
7
Temporal Question AnsweringTimeQA Hard
EM52.7
7
Showing 5 of 5 rows

Other info

Code

Follow for update