| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | BioASQ | Accuracy98.32 | 57 | |
| Question Answering | BioASQ (dev) | F1 Score77.8 | 28 | |
| Biomedical reasoning | BioASQ out-of-domain | Accuracy91.87 | 25 | |
| Domain Adaptation | BioASQ (test) | BBH54.89 | 20 | |
| Medical Question Answering | BioASQ | Accuracy80.74 | 20 | |
| Biomedical Multi-hop Question Answering | BioASQ-B | EM40.6 | 18 | |
| Extractive Question Answering | BioASQ (test) | EM47.27 | 16 | |
| Snippet Retrieval | BIOASQ 7 (test batches 1-5) | MAP0.2518 | 16 | |
| Document Retrieval | BIOASQ 7 (test batches 1-5) | MAP19.24 | 16 | |
| Question Answering | BioASQ MRQA out-of-domain evaluation 2019 (test) | EM60.3 | 15 | |
| Reading Comprehension | BioASQ MRQA out-of-domain | EM67.62 | 14 | |
| Question Answering | BioASQ factoid 7b (test) | SAcc47.4 | 13 | |
| Extractive Question Answering | BioASQ MRQA | F1 Score91 | 12 | |
| Biomedical Question Answering | BioASQ | Factoid Acc29 | 11 | |
| Retrieval | BioASQ (test) | Top-2046 | 9 | |
| Document Classification | BioASQ | Macro F171.28 | 8 | |
| Medical Question Answering | BioASQ (test) | ROUGE-128.55 | 8 | |
| Question Answering Retrieval | BioASQ | nDCG@1076.9 | 8 | |
| Generative Question Answering | BioASQ (test) | EM43.01 | 8 | |
| RAG Poisoning Attack (Document-Level Targeting) | BioASQ | RSR@548.9 | 7 | |
| Fact-Level RAG Poisoning Attack | BioASQ | RSR@575.7 | 7 | |
| Extractive Question Answering | BioASQ (50% coverage) | F1 Score76.13 | 7 | |
| Extractive Question Answering | BioASQ 20% coverage | F1 Score85.06 | 7 | |
| Snippet Retrieval | BIOASQ batches 4 and 5 7 (test) | Snippet MAP23.96 | 7 | |
| Document Retrieval | BIOASQ batches 4 and 5 7 (test) | Document MAP16.55 | 7 |