SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recognition
About
Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. However, previous GCN-based methods rely on elaborate human priors excessively and construct complex feature aggregation mechanisms, which limits the generalizability and effectiveness of networks. To solve these problems, we propose a novel Spatial Topology Gating Unit (STGU), an MLP-based variant without extra priors, to capture the co-occurrence topology features that encode the spatial dependency across all joints. In STGU, to learn the point-wise topology features, a new gate-based feature interaction mechanism is introduced to activate the features point-to-point by the attention map generated from the input sample. Based on the STGU, we propose the first MLP-based model, SiT-MLP, for skeleton-based action recognition in this work. Compared with previous methods on three large-scale datasets, SiT-MLP achieves competitive performance. In addition, SiT-MLP reduces the parameters significantly with favorable results. The code will be available at https://github.com/BUPTSJZhang/SiT?MLP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | NTU RGB+D 120 (X-set) | Accuracy90.2 | 661 | |
| Action Recognition | NTU RGB+D 60 (Cross-View) | Accuracy96.8 | 575 | |
| Action Recognition | NTU RGB-D Cross-Subject 60 | Accuracy92.3 | 305 | |
| Action Recognition | NTU RGB+D 120 Cross-Subject | Accuracy89 | 183 | |
| Skeleton-based Action Recognition | NW-UCLA | -- | 44 |