| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| CT-RATE | U-VLM | F1 Score41.4 | 26 | 2d ago | |
| MIMIC | Factual Consistency (FC)54.9 | 20 | 22d ago | ||
| IU-XRay | Factual Consistency (FC)85.71 | 20 | 22d ago | ||
| MIMIC-CXR (test) | MOTOR | BLEU-415.6 | 20 | 3mo ago | |
| DeepResearch Bench 2025 (test) | Comprehensiveness49.5 | 16 | 3mo ago | ||
| MIMIC-CXR-JPG (test) | VILA-M3-13B | BLEU-421.6 | 16 | 3mo ago | |
| WSI-Bench | MLLM-HWSI | BLEU-155.6 | 15 | 2mo ago | |
| Heartcare-Bench I (test) | ScoreGPT78.8 | 14 | 1mo ago | ||
| Heartcare-Bench S (test) | HeartcareGPT-7B | ScoreGPT76.55 | 14 | 1mo ago | |
| TMALL | RecPilot | Accuracy4.6 | 14 | 2mo ago | |
| L-MIMIC | Maira2 | Precision61.5 | 14 | 3mo ago | |
| FetUS | FetUSAgents | AC Score1.2418 | 13 | 8d ago | |
| MIMIC IV | GEM | METEOR35.06 | 12 | 3mo ago | |
| WSI | HistoSelect | BLEU-143.1 | 12 | 3mo ago | |
| MMTT | ForgeryTalker | CIDEr59.3 | 11 | 1mo ago | |
| CT-RATE (val) | CT-Agent | BLEU-150.2 | 11 | 2mo ago | |
| IU-Xray | VALOR | ROUGE-L33.1 | 10 | 3mo ago | |
| HistGen | MLLM-HWSI | BLEU-166.7 | 9 | 2mo ago | |
| Mental Health Social Media Twitter | GPT-traj | Trajectory Coverage3.9 | 8 | 19d ago | |
| eRisk Reddit 2018 | GPT-traj | Trajectory Coverage4.9 | 8 | 19d ago | |
| DTU (test) | Eyes + Bridge + QLoRA + RAFT | BLEU-441 | 7 | 7d ago | |
| ReXGradient-160K External Validation (test) | CheXmix | GREEN21.7 | 7 | 1mo ago | |
| Radiology Report Generation | RadAgents | CheXbert Macro F1 (14)53.2 | 6 | 1mo ago | |
| IXI | LLaBIT | ROUGE37.33 | 6 | 1mo ago | |
| ATLAS 2.0 | LLaBIT | ROUGE33.69 | 6 | 1mo ago |