Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-hop QA Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question AnsweringMulti-hop QA Suite (HotpotQA, 2Wiki, MuSiQue, G-Medical, G-Novel)
Average Score59.25
20
Showing 1 of 1 rows