Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge-intensive reasoning suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge-Intensive ReasoningKnowledge-Intensive Reasoning Suite 2Wiki., Bamb., HQA, MuSi., SimQA
2Wiki Score58.4
25
Knowledge-intensive reasoningKnowledge-intensive reasoning suite (HotpotQA, 2WikiMultihopQA, Musique)
HotpotQA Score43.6
6
Showing 2 of 2 rows