| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Dance Motion Generation | AIST++ | BAS0.661 | 13 | |
| Global Human Motion Recovery | AIST multi-shot (test) | PA-MPJPE36.82 | 12 | |
| Sparse keyframe in-betweening | AIST++ | FID0.032 | 12 | |
| 3D Human Pose Estimation | AIST | MPJPE87.44 | 10 | |
| 3D pose estimation | AIST++ Dance Video Dataset (test) | MPJPE (mm)73.7 | 8 | |
| 3D Human Pose Estimation | AIST (subset) | MPJPE89.1 | 5 | |
| Human 3D Mesh Recovery | AIST 43 (test) | PA-MPJPE74.1 | 5 | |
| Vector Quantization | AIST++ | Activation (%)72 | 4 | |
| Novel View Synthesis | AIST novel models and novel poses | PSNR19.03 | 3 | |
| Novel view synthesis | AIST (test) | PSNR19.03 | 3 | |
| Text-to-Music | AIST++ | BCS74.5 | 3 | |
| Video reconstruction from a single motion-blurred image | B-AIST++ | PSNR27.37 | 2 | |
| Music-to-Text | AIST++ | BLEU@411.95 | 2 | |
| Joint audio-video generation | AIST++ (test) | AV Alignment74 | 1 | |
| Audio-to-video generation | AIST++ subset (test) | AV Alignment0.77 | 1 | |
| AV interpolation | AIST++ subset (test) | AV Alignment69 | 1 |