Spatio-Temporal Covariance Descriptors for Action and Gesture Recognition
About
We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to create a final multiclass classification algorithm that employs the most useful spatio-temporal regions. We also show how the descriptors can be computed quickly through the use of integral video representations. Experiments on the UCF sport, CK+ facial expression and Cambridge hand gesture datasets indicate superior performance of the proposed method compared to several recent state-of-the-art techniques. The proposed method is robust and does not require additional processing of the videos, such as foreground detection, interest-point detection or tracking.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Gesture Recognition | Cambridge (test) | Accuracy93 | 11 | |
| Facial Expression Recognition | CK+ Extended Cohn-Kanade | Average Recognition Rate92.3 | 9 | |
| Action Recognition | UCF 24 | Average Recognition Rate0.9391 | 4 | |
| Hand Gesture Recognition | Cambridge hand gesture dataset (Set 2) | Recognition Rate94 | 4 | |
| Hand Gesture Recognition | Cambridge hand gesture dataset (Set 3) | Recognition Rate94 | 4 | |
| Hand Gesture Recognition | Cambridge hand gesture dataset (Set 4) | Recognition Rate93 | 4 | |
| Hand Gesture Recognition | Cambridge hand gesture dataset (Overall) | Avg Recognition Rate0.93 | 4 | |
| Hand Gesture Recognition | Cambridge hand gesture dataset (Set 1) | Recognition Rate92 | 4 |