| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Long-form QA Short Q, Long A (test) | WorldCup | GPT4 Score6.182 | 15 | 4d ago | |
| WaterBench (test) | SWEET | GM Score24.06 | 11 | 4d ago | |
| MultiMed-X ZU | MED-COREASONER | Overall Score4.42 | 5 | 4d ago | |
| MultiMed-X YO | MED-COREASONER | Overall Score4.45 | 5 | 4d ago | |
| MultiMed-X TH | MED-COREASONER | Overall Score4.66 | 5 | 4d ago | |
| MultiMed-X SW | MED-COREASONER | Overall Score4.55 | 5 | 4d ago | |
| MultiMed-X KO | MED-COREASONER | Overall Score4.54 | 5 | 4d ago | |
| MultiMed-X JP | MED-COREASONER | Overall Score4.43 | 5 | 4d ago | |
| MultiMed-X ZH | MED-COREASONER | Overall Score4.53 | 5 | 4d ago | |
| MultiMed-X EN | MED-COREASONER | Overall Score4.6 | 5 | 4d ago | |
| BioASQ (test) | Fine-Tuned GPT-4o + MedBioRAG | ROUGE-134.3 | 4 | 4d ago | |
| PubMedQA (test) | Fine-Tuned GPT-4o + MedBioRAG | ROUGE-137.49 | 4 | 4d ago | |
| MedicationQA (test) | Fine-Tuned GPT-4o + MedBioRAG | ROUGE-127.73 | 4 | 4d ago | |
| LiveQA (test) | GPT-4o + MedBioRAG | ROUGE-127.33 | 4 | 4d ago |