| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Error Detection | CRAG multi-hop subset (train) | Precision92 | 36 | |
| Error Detection | CRAG | F1 Score91 | 36 | |
| Gland Segmentation | CRAG | F1 Score87.4 | 19 | |
| Multimodal Retrieval-Augmented Generation | CRAG-MM (Overall) | Truthfulness20.5 | 18 | |
| Question Answering | CRAG | Finance Score20.1 | 12 | |
| Nuclei instance segmentation | CRAG Dpath (test) | Dice0.785 | 8 | |
| Gland Segmentation | CRAG (test) | F1 Score86.9 | 7 | |
| Question Answering | CRAG (test) | P@163.3 | 6 | |
| Retrieval-Augmented Generation | CRAG | Finance Accuracy16.4 | 5 |