Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

About

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios. Exploiting decent yet computationally efficient features derived at a coarse scale with a lightweight CNN model, LiteEval dynamically decides on-the-fly whether to compute more powerful features for incoming video frames at a finer scale to obtain more details. This is achieved by a coarse LSTM and a fine LSTM operating cooperatively, as well as a conditional gating module to learn when to allocate more computation. Extensive experiments are conducted on two large-scale video benchmarks, FCVID and ActivityNet, and the results demonstrate LiteEval requires substantially less computation while offering excellent classification accuracy for both online and offline predictions.

Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis• 2019

Related benchmarks

TaskDatasetResultRank
Video RecognitionFCVID (test)
mAP80
28
Action RecognitionActivityNet
Accuracy72.7
22
Action RecognitionActivityNet v1.3 (test)
mAP72.7
19
Video RecognitionKinetics Mini
Top-1 Acc61
18
Video RecognitionMini-Kinetics (test)
Accuracy61
17
Online action recognition50Salads (test)
Accuracy40.3
7
Action RecognitionFCVID
Accuracy80
6
Showing 7 of 7 rows

Other info

Follow for update