| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Video Object Segmentation | A2D-Sentences | oIoU82.1 | 57 | |
| Text-based Video Segmentation | A2D Sentences | mAP (0.5:0.95)41.2 | 11 | |
| Actor and Action Segmentation | A2D-S (val) | oIoU78.7 | 10 | |
| Referring Video Segmentation | A2D Sentences | P@0.550 | 9 | |
| Actor Semantic Segmentation | A2D (test) | Class-Avg Pixel Acc73.7 | 8 | |
| Action Semantic Segmentation | A2D (test) | mIoU49.4 | 7 | |
| Language-guided Video Object Segmentation | A2D (test) | Precision @0.557.8 | 5 | |
| Referring Video Object Segmentation | A2D-S single actor subset (test) | mAP0.336 | 4 |