| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Video Object Segmentation | MeViS (val) | J&F Score0.633 | 122 | |
| Referring Video Segmentation | MeViS | J&F Score49.5 | 50 | |
| Video Referring Expression Segmentation | MeViS (val-u) | J&F Score70.8 | 18 | |
| Referring Video Segmentation | MeViS (test) | J&F Score53.7 | 18 | |
| Referring Video Object Segmentation | MeViS v2 (val) | J&F43.9 | 8 | |
| Referring Video Object Segmentation | MeViS v1 (val) | J&F Score47.6 | 8 | |
| Audio-Guided Video Object Segmentation | MeViS v2 | J&F Score42.3 | 6 | |
| Video Object Grounding | MeViS | J Score62.3 | 6 | |
| Text-to-Video Retrieval | MeViS | Recall@155.6 | 6 | |
| Video-to-Text Retrieval | MeViS | R@1 (V2T)58.4 | 6 | |
| Video-to-Text Retrieval | MeViS (test) | R@159.2 | 5 | |
| Text-to-Video Retrieval | MeViS (test) | R@10.568 | 5 | |
| Trajectory Generation | MeViS (test) | AJ0.28 | 5 | |
| Text-to-Trajectory Retrieval | MeViS | Recall@134.2 | 5 | |
| Referring Multi-Object Tracking | MeViS v2 | HOTA*38.8 | 4 | |
| Referring Motion Expression Generation | MeViS | METEOR15.68 | 4 | |
| Spatial Grounding | MeViS (val) | J Score62.3 | 3 | |
| Referring Multi-Object Tracking | MeViS v2 (test) | HOTA*- | 0 | |
| Audio-Guided Video Object Segmentation | MeViS v2 (test) | J&F Score- | 0 |