| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image-to-Video Generation | OpenVid-1M (val) | FVD156.94 | 12 | |
| Video Captioning Evaluation | OpenVid-1M | CLIP Score0.7998 | 12 | |
| Video Annotation | OpenVid-1M | Length224.23 | 12 | |
| Video Generation | OpenVid (test) | LPIPS0.113 | 7 | |
| Text-to-Video Generation | OpenVid-1M (fine-tuning) | VQAA60.43 | 6 | |
| Text-to-Image Generation | OpenVid 80 samples 1.0 (test) | SSCD0.3315 | 4 | |
| Sketch-based Video Generation | OpenVid 1M (200 random examples) | LPIPS27.56 | 4 | |
| Text-to-Image Generation | OpenVid 1.0 (test) | SSCD62.05 | 2 | |
| Text-to-Image Generation | OpenVid 395 samples 1.0 (test) | SSCD0.5918 | 2 | |
| Text-to-Image Generation | OpenVid Single-image overfitting 1.0 (test) | SSCD0.7298 | 2 |