Skeleton-aided Articulated Motion Generation
About
This work make the first attempt to generate articulated human motion sequence from a single image. On the one hand, we utilize paired inputs including human skeleton information as motion embedding and a single human image as appearance reference, to generate novel motion frames, based on the conditional GAN infrastructure. On the other hand, a triplet loss is employed to pursue appearance-smoothness between consecutive frames. As the proposed framework is capable of jointly exploiting the image appearance space and articulated/kinematic motion space, it generates realistic articulated motion sequence, in contrast to most previous video generation methods which yield blurred motion effects. We test our model on two human action datasets including KTH and Human3.6M, and the proposed framework generates very promising results on both datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hand gesture-to-gesture translation | Senz3D (test) | FID38.1758 | 11 | |
| Hand Gesture Recognition | Senz3D 30% (test) | Accuracy99.495 | 6 | |
| Hand Gesture Recognition | NTU Hand Digit (test) | Accuracy95.333 | 6 | |
| Hand Gesture Image Generation | Senz3D 27 (test) | MSE175.9 | 5 | |
| Hand Gesture Image Generation | NTU Hand Digit 22 (test) | MSE118.1 | 5 | |
| Hand gesture-to-gesture translation | NTU Hand Digit | AMT Perceptual Score2.6 | 5 | |
| Hand gesture-to-gesture translation | Senz3D | AMT Score (%)2.3 | 5 | |
| Hand gesture-to-gesture translation | NTU Hand Digit (test) | FID31.2841 | 5 |