| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Error Detection | CRAG multi-hop subset (train) | Precision92 | 36 | |
| Error Detection | CRAG | F1 Score91 | 36 | |
| Gland Segmentation | CRAG (test) | DICE Score89.4 | 26 | |
| Gland Segmentation | CRAG | F1 Score87.4 | 19 | |
| Multimodal Retrieval-Augmented Generation | CRAG-MM (Overall) | Truthfulness20.5 | 18 | |
| Question Answering | CRAG | Finance Score20.1 | 12 | |
| Nuclei instance segmentation | CRAG Dpath (test) | Dice0.785 | 8 | |
| Question Answering | CRAG (test) | P@163.3 | 6 | |
| Semantic Segmentation | CRAG | Dice Score88.58 | 5 | |
| Retrieval-Augmented Generation | CRAG | Finance Accuracy16.4 | 5 | |
| Multi-source Answer Generation | CRAG Task 2 (test) | Accuracy (%)41 | 3 | |
| Answer Generation (Unstructured Context) | CRAG Task 1 (test) | Accuracy34.23 | 3 |