Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
About
Graph convolutional networks (GCNs) have been widely used and achieved remarkable results in skeleton-based action recognition. In GCNs, graph topology dominates feature aggregation and therefore is the key to extracting representative features. In this work, we propose a novel Channel-wise Topology Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies and effectively aggregate joint features in different channels for skeleton-based action recognition. The proposed CTR-GC models channel-wise topologies through learning a shared topology as a generic prior for all channels and refining it with channel-specific correlations for each channel. Our refinement method introduces few extra parameters and significantly reduces the difficulty of modeling channel-wise topologies. Furthermore, via reformulating graph convolutions into a unified form, we find that CTR-GC relaxes strict constraints of graph convolutions, leading to stronger representation capability. Combining CTR-GC with temporal modeling modules, we develop a powerful graph convolutional network named CTR-GCN which notably outperforms state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | NTU RGB+D 120 (X-set) | Accuracy90.6 | 661 | |
| Action Recognition | NTU RGB+D (Cross-View) | Accuracy96.8 | 609 | |
| Action Recognition | NTU RGB+D 60 (Cross-View) | Accuracy96.8 | 575 | |
| Action Recognition | NTU RGB+D (Cross-subject) | Accuracy92.4 | 474 | |
| Action Recognition | NTU RGB+D 60 (X-sub) | Accuracy92.4 | 467 | |
| Action Recognition | NTU RGB+D X-sub 120 | Accuracy88.9 | 377 | |
| Action Recognition | NTU RGB-D Cross-Subject 60 | Accuracy93.9 | 305 | |
| Skeleton-based Action Recognition | NTU 60 (X-sub) | Accuracy92.7 | 220 | |
| Skeleton-based Action Recognition | NTU RGB+D (Cross-View) | Accuracy96.8 | 213 | |
| Skeleton-based Action Recognition | NTU RGB+D 120 (X-set) | Top-1 Accuracy90.7 | 184 |