Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN
About
Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the framework showed competitive performance with the state-of-the-art. Its capability to support real-time HGR applications was also demonstrated through deployment on standard consumer PC hardware, showcasing low latency and minimal resource usage in real-world settings. The successful deployment of this framework underscores its potential to enhance real-time applications in fields such as virtual/augmented reality, ambient intelligence, and assistive technologies, providing a scalable and efficient solution for dynamic gesture recognition.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hand Gesture Recognition | SHREC 2017 (val) | Accuracy (14G)97.86 | 15 | |
| Hand Gesture Recognition | DHG1428 (val) | Accuracy (14G)95.83 | 13 | |
| Hand Gesture Recognition | FPHA (val) | Accuracy91.83 | 10 | |
| Hand Gesture Recognition | FPHA 1:1 evaluation protocol (val) | Accuracy91.83 | 10 | |
| Gesture Recognition | LMDHG (val) | Accuracy98.97 | 8 | |
| Human Action Recognition | SBUKID (Cross-Validation) | Accuracy93.96 | 5 | |
| Hand Gesture Recognition | CNR (val) | Accuracy97.05 | 4 |