| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Cross-modal Retrieval | InstVL video (Global) | T2V R@194.5 | 12 | |
| Cross-modal Retrieval | InstVL video (Instance) | T2V Recall@160.63 | 12 | |
| Cross-modal Retrieval | InstVL img-zero 10K (Global) | T2V R@183.33 | 12 | |
| Cross-modal Retrieval | InstVL img-zero 10K | T2V R@128.25 | 12 | |
| Cross-modal Retrieval | InstVL img-zero 1K (Global) | T2V Recall@188.7 | 12 | |
| Cross-modal Retrieval | InstVL img-zero 1K (Instance) | T2V R@141.94 | 12 | |
| Cross-modal Retrieval | InstVL img 10K (Global) | T2V Recall@195.77 | 12 | |
| Cross-modal Retrieval | InstVL img 10K | T2V Recall@144.05 | 12 | |
| Cross-modal Retrieval | InstVL img 1K (Global) | T2V R@199.2 | 12 | |
| Cross-modal Retrieval | InstVL img 1K Instance | T2V R@150.25 | 12 |