| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | QASPER (test) | F1 Score (Match)61.5 | 132 | |
| Question Answering | qasper | F1 Score36.9 | 61 | |
| Document Question Answering | Qasper | Accuracy40.8 | 44 | |
| Single-document retrieval | Qasper | F1 Score50.3 | 44 | |
| Text Question Answering | Qasper | Accuracy61.3 | 37 | |
| Language Generation | QASPER | Accuracy15.35 | 35 | |
| Retrieval | QASPER (test) | F1 Score30.27 | 30 | |
| Single-hop Question Answering | Qasper | Score44.79 | 22 | |
| Question Answering | QASPER 1200:251 (test) | Answerable EM28.92 | 20 | |
| Question Answering | Qasper | Precision84 | 18 | |
| Question Answering | Qasper | EM Score65 | 18 | |
| Long-context Question Answering | Qasper 128K context | F1 Score39 | 18 | |
| Long-context Question Answering | Qasper | F183.09 | 17 | |
| Question Answering | QASPER Long-doc | R@161.91 | 16 | |
| Uncertainty Estimation | QASPER | AUROC0.722 | 16 | |
| Question Answering | Qasper | F1 Score0.3677 | 16 | |
| Question Answering | QASPER | Rouge-L38.8 | 15 | |
| Question Answering | Qasper | Recall67.3 | 15 | |
| Question Answering | Qasper | ASR Score30 | 14 | |
| Question Answering | QASPER | TTFT (ms)233.2 | 12 | |
| Question Answering | QASPER | Peak GPU Memory (GB)0.53 | 12 | |
| Multi-session Retrieval-Augmented Generation | QASPER (test) | F1 Score36 | 12 | |
| Speculative Decoding | Qasper | SR1.66 | 12 | |
| Single-document retrieval | Qasper | Latency (s)0.0054 | 11 | |
| Long document retrieval | Qasper (test) | F1 Score46.18 | 11 |