| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Classification | UCM (test) | Overall Accuracy97.49 | 80 | |
| Few-Shot Classification | UCM | Accuracy89.85 | 54 | |
| Scene Classification | UCM 1.0 (50% train ratio) | Accuracy99.81 | 43 | |
| Scene Classification | UCM 1.0 (train) | Accuracy99.88 | 31 | |
| Scene Classification | UCM | Top-1 Accuracy98.81 | 28 | |
| Image-to-Text Retrieval | UCM (test) | Recall@119.33 | 27 | |
| Text-to-Image Retrieval | UCM (test) | R@122.71 | 27 | |
| Image captioning | UCM Captions | BLEU-479.92 | 19 | |
| Image Classification | UCM | Accuracy98.2 | 14 | |
| Cross-modal Retrieval | UCM (test) | R@1 (I2T)28.57 | 12 | |
| Text-to-image Retrieval | UCM caption | R@1/5/1075.33 | 11 | |
| Image-text retrieval | UCM | I2T R@120.48 | 9 | |
| Text-to-Image Retrieval | UCM (val) | Recall@543.5 | 8 | |
| Image-to-Text Retrieval | UCM (val) | R@545.7 | 8 | |
| Text-to-image retrieval | UCM | R@111.1 | 8 | |
| Image-to-text retrieval | UCM | Recall@111.6 | 8 | |
| Classification | UCM-CLS (test) | Top-1 Accuracy66.7 | 8 | |
| Scene Classification | UCM (test) | Top-1 Acc98.57 | 7 | |
| Image-to-text Retrieval | UCM caption | Avg Recall (R@1/5/10)52.86 | 6 | |
| Image Captioning | UCM | BLEU-10.918 | 6 | |
| Dataset Distillation | UCM | Training Time (h)13.75 | 5 |