| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PubMedQA | Nemotron-CC_ASI | Accuracy68.32 | 13 | 17d ago | |
| BioASQ | LLAMA-70B+7B (PT) | Factoid Acc29 | 11 | 1mo ago | |
| PubMedQA PQA-L In-Domain (test) | Human (expert) | Accuracy78 | 11 | 1mo ago | |
| MedMCQA In-Domain (test) | Human (expert) | Accuracy90 | 10 | 1mo ago | |
| BioASQ (test) | GRIP | ROUGE54.8 | 8 | 4d ago | |
| BioMRC TINY Setting A (test) | AOA-READER WITH BIOBERT EMBEDDING | Accuracy93.33 | 8 | 1mo ago | |
| BioACE human-assessed TREC BioGen 2025 | Nugget Precision96.41 | 7 | 1mo ago | ||
| BioMRC LITE Setting A (test) | AOA-READER WITH BIOBERT EMBEDDING | Accuracy86.74 | 7 | 1mo ago | |
| BioMRC LITE Setting A (dev) | AOA-READER WITH BIOBERT EMBEDDING | Accuracy87.22 | 7 | 1mo ago | |
| BioACE automatic evaluation N=50 runs | Precision94.96 | 6 | 1mo ago | ||
| Four biomedical QA datasets macro-averaged (test) | Med42-Llama3-8B | Faithfulness85.3 | 4 | 1mo ago | |
| PubMedQA (test) | MedBayes-Lite | CUS0.254 | 2 | 17d ago | |
| MedQA (test) | CUS42.1 | 2 | 17d ago |