Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TimeQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Temporal Question AnsweringTimeQA Hard
EM77.7
25
Temporal ReasoningTimeQA Hard 1.0 (test)
EM82.2
24
Temporal ReasoningTimeQA Easy 1.0 (test)
EM93.7
24
Temporal Question AnsweringTimeQA Easy
R-163.4
20
Temporal Question AnsweringTimeQA Easy-mode
Exact Match (EM)85.4
18
Question AnsweringTimeQA
GPT Accuracy51.26
14
Temporal Question AnsweringTimeQA Hard v1
R-10.504
12
Temporal Question AnsweringTimeQA Easy v1
R-1 Score58
12
Temporal Question AnsweringTimeQA
Nugget R@2049.7
9
Factual ReasoningTimeQA v2
Baseline Wins4
2
Showing 10 of 10 rows