Actionness Estimation Using Hybrid Fully Convolutional Networks

About

Actionness was introduced to quantify the likelihood of containing a generic action instance at a specific location. Accurate and efficient estimation of actionness is important in video analysis and may benefit other relevant tasks such as action recognition and action detection. This paper presents a new deep architecture for actionness estimation, called hybrid fully convolutional network (H-FCN), which is composed of appearance FCN (A-FCN) and motion FCN (M-FCN). These two FCNs leverage the strong capacity of deep models to estimate actionness maps from the perspectives of static appearance and dynamic motion, respectively. In addition, the fully convolutional nature of H-FCN allows it to efficiently process videos with arbitrary sizes. Experiments are conducted on the challenging datasets of Stanford40, UCF Sports, and JHMDB to verify the effectiveness of H-FCN on actionness estimation, which demonstrate that our method achieves superior performance to previous ones. Moreover, we apply the estimated actionness maps on action proposal generation and action detection. Our actionness maps advance the current state-of-the-art performance of these tasks substantially.

Limin Wang, Yu Qiao, Xiaoou Tang, Luc Van Gool• 2016

Related benchmarks

Task	Dataset	Result
Action Detection	JHMDB (split-1)	Brush Hair AP76.4	12
Actionness estimation	UCF Sports (test)	mAP82.7	7
Actionness estimation	JHMDB (test)	mAP86.5	5
Spatial action detection	J-HMDB	Video mAP (IoU=0.5)56.4	5
Spatio-temporal Action Localization	JHMDB (3-split average)	Frame mAP39.9	5
Actionness estimation	Stanford 40 (test)	mAP79.7	4

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord