Dance with Flow: Two-in-One Stream Action Detection

About

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

Jiaojiao Zhao, Cees G.M. Snoek• 2019

Related benchmarks

Task	Dataset	Result
Action Detection	JHMDB-21	video-mAP@0.574.7	21
Spatio-temporal action detection	UCFSports	mAP@0.5096.52	13
Video Action Detection	UCF101 24	F-mAP@0.578.5	13
Action Detection	UCF101 24	video-mAP@0.548.3	13
Action Detection	JHMDB (trimmed)	Video-mAP@0.574.7	12
Spatio-temporal action detection	UCF101 24	mAP@0.2078.48	11
Spatio-temporal action detection	UCF101 24	F@0.578.5	10
Action Detection	UCF101 24 untrimmed	Video-mAP@0.550.3	10
Spatio-temporal action detection	J-HMDB	mAP@0.5074.74	9

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord