Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoreHopQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
First-error step detectionMoreHopQA
AUROC0.7885
27
Step-wise Confidence AttributionMoreHopQA
AUROC0.8084
27
Multi-hop Question AnsweringMoreHopQA
Accuracy86.4
25
Multi-hop RetrievalMorehopqa (test)
Recall80.5
16
Uncertainty EstimationMoreHopQA Camel
AUROC65.29
16
Multi-hop Question AnsweringMorehopQA
AUROC0.6457
16
Uncertainty EstimationMoreHopQA AutoGen (test)
AUROC63.92
16
Stepwise error detectionMoreHopQA (test)
AUROC0.808
15
Open-ended Question AnsweringMoreHopQA (test)
Accuracy77
11
Multi-hop Question AnsweringMoreHopQA (test)
Accuracy53.4
9
Multi-hop QA RetrievalMoreHopQA
NDCG0.908
5
Question AnsweringMoreHopQA
Inference Time (s)7.27
4
Showing 12 of 12 rows