Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TimeQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Temporal ReasoningTimeQA Hard 1.0 (test)
EM82.2
24
Temporal ReasoningTimeQA Easy 1.0 (test)
EM93.7
24
Temporal Question AnsweringTimeQA Easy
R-163.4
20
Question AnsweringTimeQA
GPT Accuracy51.26
14
Temporal Question AnsweringTimeQA Hard v1
R-10.504
12
Temporal Question AnsweringTimeQA Easy v1
R-1 Score58
12
Temporal Question AnsweringTimeQA Hard
EM52.7
7
Factual ReasoningTimeQA v2
Baseline Wins4
2
Showing 8 of 8 rows