| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| First-error step detection | MoreHopQA | AUROC0.7885 | 27 | |
| Step-wise Confidence Attribution | MoreHopQA | AUROC0.8084 | 27 | |
| Multi-hop Question Answering | MoreHopQA | Accuracy86.4 | 25 | |
| Multi-hop Retrieval | Morehopqa (test) | Recall80.5 | 16 | |
| Uncertainty Estimation | MoreHopQA Camel | AUROC65.29 | 16 | |
| Multi-hop Question Answering | MorehopQA | AUROC0.6457 | 16 | |
| Uncertainty Estimation | MoreHopQA AutoGen (test) | AUROC63.92 | 16 | |
| Stepwise error detection | MoreHopQA (test) | AUROC0.808 | 15 | |
| Open-ended Question Answering | MoreHopQA (test) | Accuracy77 | 11 | |
| Multi-hop Question Answering | MoreHopQA (test) | Accuracy53.4 | 9 | |
| Multi-hop QA Retrieval | MoreHopQA | NDCG0.908 | 5 | |
| Question Answering | MoreHopQA | Inference Time (s)7.27 | 4 |