Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Standard QA Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringStandard QA Benchmarks (2WikiMultiHopQA, HotpotQA, Bamboogle, MuSiQue, Natural Questions, TriviaQA, PopQA) (test)
2WikiMultiHopQA Pass@180.4
11
Open-domain Question AnsweringStandard QA Benchmarks Average
Avg@461
9
Showing 2 of 2 rows