| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Video Object Segmentation | MeViS (val) | J&F Score0.635 | 166 | |
| Referring Video Segmentation | MeViS | J&F Score73.1 | 101 | |
| Referring Video Segmentation | MeViS (test) | J&F Score53.7 | 25 | |
| Referring Video Object Segmentation | MeViS v1 | J&F Score62.4 | 19 | |
| Video Referring Expression Segmentation | MeViS (val-u) | J&F Score70.8 | 18 | |
| Referring Video Object Segmentation | MeViS (val-u) | J&F Score72.2 | 17 | |
| Referring Video Object Segmentation | MeViS v2 (val) | J&F60.1 | 16 | |
| Motion localization | MeViS | SL Score68 | 15 | |
| Referring Video Object Segmentation | MeViS v1 (val) | J&F Score47.6 | 8 | |
| Video Object Segmentation | MeViS | J&F Score52.2 | 7 | |
| Referring Video Object Segmentation and Point-to-Mask Tracking | MeViS-U | F-Score37.1 | 6 | |
| Referring Video Object Segmentation | MeViS-Text PVUW 2026 (test) | J&F Score78.97 | 6 | |
| Audio-Guided Video Object Segmentation | MeViS v2 | J&F Score42.3 | 6 | |
| Video Object Grounding | MeViS | J Score62.3 | 6 | |
| Text-to-Video Retrieval | MeViS | Recall@155.6 | 6 | |
| Video-to-Text Retrieval | MeViS | R@1 (V2T)58.4 | 6 | |
| Audio-guided video object segmentation | MeVis Audio 5th PVUW Challenge V2 (Challenge Track) | J&F Score67 | 5 | |
| Referring Video Object Segmentation | MeViS text | J&F Score78.97 | 5 | |
| Audio-referring Video Object Segmentation | MeViS Audio (leaderboard) | J&F Score67 | 5 | |
| Video-to-Text Retrieval | MeViS (test) | R@159.2 | 5 | |
| Text-to-Video Retrieval | MeViS (test) | R@10.568 | 5 | |
| Trajectory Generation | MeViS (test) | AJ0.28 | 5 | |
| Text-to-Trajectory Retrieval | MeViS | Recall@134.2 | 5 | |
| Video Reasoning Segmentation | MeViS | FPS5.1 | 4 | |
| Referring Multi-Object Tracking | MeViS v2 | HOTA*38.8 | 4 |