| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Person Re-identification | GRID | Rank-1 Acc56.9 | 44 | |
| Person Re-Identification | GRID (test) | Rank-1 Acc57.2 | 24 | |
| Person Re-identification | GRID (target) | mAP60.1 | 20 | |
| Graph Generation | GRID | Degree Similarity45.5 | 19 | |
| Lip-reading | GRID (test) | WER1.09 | 18 | |
| Person Re-identification | GRID Protocol-1 | mAP68.1 | 16 | |
| Generic Graph Generation | Grid Synthetic, 100 ≤ |V| ≤ 400 (test) | Degree Similarity1.996 | 12 | |
| Adaptive Traffic Signal Control | Grid4x4 | Average Trip Time (s)159.07 | 12 | |
| Multi-speaker Dubbing | GRID Dub 1.0 (test) | SPK-SIM (%)100 | 12 | |
| Person Re-identification | GRID G (test) | R156.4 | 12 | |
| Video-to-Speech Synthesis | GRID (test) | Sim-O0.87 | 11 | |
| Link Prediction | Grid probe (test) | AUC0.639 | 11 | |
| Visual Text-to-Speech | GRID | WER10.9 | 10 | |
| Movie Dubbing | GRID Dubbing Setting 2.0 | LSE-C7.134 | 10 | |
| Movie Dubbing | GRID Dubbing Setting 1.0 | LSE-C7.13 | 10 | |
| Video Dubbing | GRID Setting 2.0 (test) | LSE-C7.28 | 8 | |
| Video Dubbing | GRID Setting 1.0 (test) | LSE-C5.23 | 8 | |
| Constrained Reinforcement Learning | Grid | Episodic Reward276.3 | 8 | |
| Speech Reconstruction | GRID (speaker-dependent) | STOI0.738 | 7 | |
| Person Re-identification | GRID P=900 (test) | Rank-116.56 | 7 | |
| Dubbing | GRID | DD0 | 6 | |
| Movie Dubbing | GRID2V2C | DD (Sync Error)0 | 6 | |
| Graph Generation | Grid (test) | Train Time (s)0.28 | 6 | |
| Video-Driven Text-to-Speech | GRID standard (test) | LSE-C7.68 | 6 | |
| Lipreading | GRID | WER2.9 | 6 |