Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Compressed Video Action Recognition

About

Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video. This representation has a higher information density, and we found the training to be easier. In addition, the signals in a compressed video provide free, albeit noisy, motion information. We propose novel techniques to use them effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times faster than ResNet-152. On the task of action recognition, our approach outperforms all the other methods on the UCF-101, HMDB-51, and Charades dataset.

Chao-Yuan Wu, Manzil Zaheer, Hexiang Hu, R. Manmatha, Alexander J. Smola, Philipp Kr\"ahenb\"uhl• 2017

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101
Accuracy94.9
365
Action RecognitionUCF101 (mean of 3 splits)
Accuracy90.4
357
Action RecognitionUCF101 (test)
Accuracy90.4
307
Action RecognitionHMDB51 (test)
Accuracy0.591
249
Action RecognitionHMDB51
Top-1 Acc70.2
225
Action RecognitionHMDB51
3-Fold Accuracy70.2
191
Action RecognitionUCF-101
Top-1 Acc94.9
147
Video Action RecognitionHMDB-51 (3 splits)
Accuracy59.1
116
Action RecognitionUCF101 (Split 1)--
105
Action RecognitionCharades (val)
mAP21.9
69
Showing 10 of 21 rows

Other info

Follow for update