Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoreHopQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question AnsweringMoreHopQA
Accuracy86.4
25
Uncertainty EstimationMoreHopQA Camel
AUROC65.29
16
Multi-hop Question AnsweringMorehopQA
AUROC0.6457
16
Uncertainty EstimationMoreHopQA AutoGen (test)
AUROC63.92
16
Open-ended Question AnsweringMoreHopQA (test)
Accuracy77
11
Multi-hop QA RetrievalMoreHopQA
NDCG0.908
5
Question AnsweringMoreHopQA
Inference Time (s)7.27
4
Showing 7 of 7 rows