Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ReasonQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Temporal Question AnsweringReasonQA Multi-hop
Set Accuracy85
7
Temporal Question AnsweringReasonQA Single-hop
Set Accuracy95.1
7
Showing 2 of 2 rows