Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ActionFlowNet: Learning Motion Representation for Action Recognition

About

Even with the recent advances in convolutional neural networks (CNN) in various visual recognition tasks, the state-of-the-art action recognition system still relies on hand crafted motion feature such as optical flow to achieve the best performance. We propose a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow while recognizing actions with convolutional neural networks, capturing both appearance and motion in a single model. We additionally provide insights to how the quality of the learned optical flow affects the action recognition. Our model significantly improves action recognition accuracy by a large margin 31% compared to state-of-the-art CNN-based action recognition models trained without external large scale data and additional optical flow input. Without pretraining on large external labeled datasets, our model, by well exploiting the motion information, achieves competitive recognition accuracy to the models trained with large labeled datasets such as ImageNet and Sport-1M.

Joe Yue-Hei Ng, Jonghyun Choi, Jan Neumann, Larry S. Davis• 2016

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101
Accuracy83.9
365
Action RecognitionUCF101 (mean of 3 splits)
Accuracy83.9
357
Action RecognitionUCF101 (test)
Accuracy83.9
307
Action RecognitionHMDB51 (test)
Accuracy0.564
249
Action RecognitionHMDB51
3-Fold Accuracy56.4
191
Video Action RecognitionHMDB-51 (3 splits)
Accuracy56.4
116
Action RecognitionUCF101 (Split 1)--
105
Action RecognitionHMDB51 (split 1)
Top-1 Acc56.4
75
Action ClassificationHMDB51 (split1)
Accuracy56.4
58
Action RecognitionUCF101 (1)
Accuracy83.9
29
Showing 10 of 14 rows

Other info

Follow for update