| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | TQA (test) | AUROC90.2 | 90 | |
| Question Answering | TQA | Absolute Execution Time Overhead (s)0.173 | 90 | |
| Question Answering | TQA | PRR86.1 | 90 | |
| Question Answering | TQA | Accuracy92.3 | 80 | |
| Question Answering | TQA | Accuracy76.8 | 60 | |
| Table Question Answering | TQA FinQA, HiTab, TAT-QA, TabMWP, WTQ | FinQA Accuracy40.48 | 20 | |
| Question Answering | TQA Poison Attack (test) | Accuracy75.6 | 18 | |
| Question Answering | TQA PIA Attack (test) | Accuracy76.4 | 18 | |
| Knowledge gap detection | TQA | Accuracy83.2 | 18 | |
| Question Answering | TQA poison @ Position 10, k=10 (test) | Robustness Accuracy71 | 15 | |
| Question Answering | TQA poison @ Position 1, k=10 (test) | Robustness Accuracy66.4 | 15 | |
| Question Answering | TQA | EM42.12 | 14 | |
| Visual Question Answering | TQA | Accuracy77.5 | 13 | |
| Inference Efficiency | TQA | Relative Execution Time Overhead0.05 | 12 | |
| Open-Domain Question Answering | TQA (test) | EM66.45 | 11 | |
| Visual Reasoning | TQA | Accuracy86.7 | 8 | |
| Open-Domain Question Answering | TQA | Accuracy71.4 | 8 | |
| Information Retrieval | TQA (test) | Recall@578.3 | 8 | |
| Retrieval-Augmented Generation | TQA open | Accuracy46.24 | 8 | |
| Textbook Question Answering | TQA (test) | Accuracy86.7 | 7 | |
| Retrieval | TQA | NDCG@1059.02 | 6 | |
| Question Answering | TQA Benign (test) | Accuracy76.4 | 6 | |
| Context Compression & QA | TQA (val) | EM59.7 | 6 | |
| Open-Domain Question Answering | TQA | R@137.39 | 4 | |
| Open-Domain Question Answering | TQA | Exact Match (EM)75.23 | 3 |