| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | qasper | F1 Score36.9 | 61 | |
| Single-document retrieval | Qasper | F1 Score50.3 | 44 | |
| Language Generation | QASPER | Accuracy15.35 | 35 | |
| Single-hop Question Answering | Qasper | Score44.79 | 22 | |
| Question Answering | QASPER 1200:251 (test) | Answerable EM28.92 | 20 | |
| Long-context Question Answering | Qasper | F183.09 | 17 | |
| Question Answering | Qasper | F1 Score0.3677 | 16 | |
| Question Answering | QASPER (test) | F1 Score (Match)55.7 | 16 | |
| Question Answering | Qasper | Recall67.3 | 15 | |
| Multi-session Retrieval-Augmented Generation | QASPER (test) | F1 Score36 | 12 | |
| Speculative Decoding | Qasper | SR1.66 | 12 | |
| Document Question Answering | Qasper | Accuracy0.552 | 11 | |
| Single-document retrieval | Qasper | Latency (s)0.0054 | 11 | |
| Long document retrieval | Qasper (test) | F1 Score46.18 | 11 | |
| Completeness | QASPER | Kendall's Tau0.44 | 11 | |
| Long-context Question Answering | Qasper | Extract F154.57 | 10 | |
| Faithfulness Evaluation | Qasper yes/no question answering | AOPC@100.102 | 10 | |
| Question Answering | Qasper (val) | F128.8 | 10 | |
| Question Answering | QASPER Multi-Document 4 | Accuracy76.2 | 9 | |
| Question Answering | QASPER Extractive (test) | F153.3 | 8 | |
| Question Answering | QASPER Extractive (dev) | F129.6 | 8 | |
| RAG-Completeness | QASPER (test) | Mean Kendall Tau Correlation0.44 | 6 | |
| Document Retrieval | Qasper | R@10.25 | 2 | |
| Question Answering | QASPER SCROLLS (val) | F1 Score0.275 | 2 | |
| Question Answering | Qasper (dev) | F140.6 | 2 |