| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | BioASQ | Accuracy98.32 | 72 | |
| Medical Question Answering | BioASQ | Accuracy80.74 | 38 | |
| Hallucination Detection | BioASQ | AUROC81.13 | 28 | |
| Selective Prediction | BioASQ | E-AURC0.2744 | 28 | |
| Question Answering | BioASQ (dev) | F1 Score77.8 | 28 | |
| Biomedical reasoning | BioASQ out-of-domain | Accuracy91.87 | 25 | |
| Domain Adaptation | BioASQ (test) | BBH54.89 | 20 | |
| Biomedical Multi-hop Question Answering | BioASQ-B | EM40.6 | 18 | |
| Extractive Question Answering | BioASQ (test) | EM47.27 | 16 | |
| Snippet Retrieval | BIOASQ 7 (test batches 1-5) | MAP0.2518 | 16 | |
| Document Retrieval | BIOASQ 7 (test batches 1-5) | MAP19.24 | 16 | |
| Question Answering | BioASQ MRQA out-of-domain evaluation 2019 (test) | EM60.3 | 15 | |
| Reading Comprehension | BioASQ MRQA out-of-domain | EM67.62 | 14 | |
| Question Answering | BioASQ factoid 7b (test) | SAcc47.4 | 13 | |
| Extractive Question Answering | BioASQ MRQA | F1 Score91 | 12 | |
| Biomedical Question Answering | BioASQ | Factoid Acc29 | 11 | |
| Question Answering | BioASQ | SAME_CONCLUSION Score85.71 | 10 | |
| Retrieval | BioASQ (test) | Top-2046 | 9 | |
| Biomedical Question Answering | BioASQ (test) | ROUGE54.8 | 8 | |
| Question Answering | BioASQ MRQA Out-of-domain | F1 Score49.37 | 8 | |
| Document Classification | BioASQ | Macro F171.28 | 8 | |
| Medical Question Answering | BioASQ (test) | ROUGE-128.55 | 8 | |
| Question Answering Retrieval | BioASQ | nDCG@1076.9 | 8 | |
| Generative Question Answering | BioASQ (test) | EM43.01 | 8 | |
| Question Answering | BioASQ Task B 14 2026 challenge edition (sampled 1000 factoid questions) | ECE0.071 | 7 |