| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Referring Image Segmentation | Mug19 1.0 (All) | mIoU75.1 | 12 | |
| Text-to-Video Generation | MUG | Image Similarity (IS)5.94 | 7 | |
| Referring Image Segmentation | Mug19 Semantic Distractor 1.0 | mIoU78.8 | 6 | |
| Semantic Segmentation | Mug19 variant | mIoU71.9 | 6 | |
| Video Generation | MUG (test) | FID17.64 | 6 | |
| Sequence Prediction | MUG (test) | Accuracy92.9 | 5 | |
| Disentangled Video Generation | MUG 15% holdout (test) | Accuracy Consistency93.33 | 4 | |
| Multimodal Video Generation | MUG | Gender Accuracy98.14 | 2 |