Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Structure-Aware Human-Action Generation

About

Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence. Most existing methods borrow ideas from video generation, which naively treat skeleton nodes/joints as pixels of images without considering the rich inter-frame and intra-frame structure information, leading to potential distorted actions. Graph convolutional networks (GCNs) is a promising way to leverage structure information to learn structure representations. However, directly adopting GCNs to tackle such continuous action sequences both in spatial and temporal spaces is challenging as the action graph could be huge. To overcome this issue, we propose a variant of GCNs to leverage the powerful self-attention mechanism to adaptively sparsify a complete action graph in the temporal space. Our method could dynamically attend to important past frames and construct a sparse graph to apply in the GCN framework, well-capturing the structure information in action sequences. Extensive experimental results demonstrate the superiority of our method on two standard human action datasets compared with existing methods.

Ping Yu, Yang Zhao, Chunyuan Li, Junsong Yuan, Changyou Chen• 2020

Related benchmarks

TaskDatasetResultRank
Action GenerationHuman3.6
MMD (Average)0.146
16
Local human action synthesisHuman3.6M
MMDa0.146
9
Action Sequence GenerationHuman3.6M
Directions Error0.42
8
Local human action synthesisNTU-2D RGB+D (Cross-Subject)
MMDa0.285
6
Local human action synthesisNTU-2D RGB+D (Cross-View)
MMDa0.316
6
Human Action GenerationNTU RGB+D (Cross-View)
MMDavg0.316
4
Human Action GenerationNTU RGB+D (Cross-subject)
MMD (Average)0.285
4
Showing 7 of 7 rows

Other info

Code

Follow for update