| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | PubMedQA | Accuracy83.6 | 145 | |
| Question Answering | PubMedQA (test) | Accuracy81.8 | 81 | |
| Medical Question Answering | PubMedQA | Accuracy81.4 | 45 | |
| Question Answering | PubMedQA PQA-L (test) | Accuracy78.2 | 25 | |
| Question Answering | PubMedQA | EM79.82 | 18 | |
| Prompt Leakage Attack | PubMedQA | ASR (500)14 | 16 | |
| Question Answering | PubMedQA | Context Influence115.78 | 15 | |
| Question Answering | PubMedQA (out-of-domain) | ROUGE-L11.7 | 14 | |
| Biomedical Question Answering | PubMedQA PQA-L In-Domain (test) | Accuracy78 | 11 | |
| Close-ended QA | PubMedQA | Accuracy85 | 10 | |
| Medical Question Answering | PubMedQA Reasoning Required | Accuracy82 | 10 | |
| Question Answering | PubMedQA | Accuracy78.6 | 9 | |
| Evaluating Context Influence and Input Regurgitation | PubMedQA | Context Influence Score I(D; y_tilde)97.95 | 9 | |
| Retrieval-Augmented Generation | PubMedQA | Accuracy77.9 | 8 | |
| Medical Question Answering | PubMedQA Synthetic NIID 1.0 (test) | Accuracy75.1 | 7 | |
| Medical Question Answering | PubMedQA Synthetic IID 1.0 (test) | Accuracy75.1 | 7 | |
| Pre-training data contamination detection | PubMedQA (PMQA) (test) | AUC0.54 | 7 | |
| Question Answering | PubMedQA | Acc66.4 | 6 | |
| Question Answering | PubMedQA | BLEU-19.7 | 6 | |
| Question Answering | PubmedQA | F143.21 | 5 | |
| Question Answering | PubMedQA English (test) | Accuracy74.64 | 5 | |
| Question Answering | PubMedQA official (val) | F1 Score93.33 | 4 | |
| Medical Question Answering | PubMedQA | Pass@186 | 4 | |
| Long-form QA | PubMedQA (test) | ROUGE-137.49 | 4 |