Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

About

Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.

Serena Yeung, Olga Russakovsky, Ning Jin, Mykhaylo Andriluka, Greg Mori, Li Fei-Fei• 2015

Related benchmarks

Task	Dataset	Result
Online Action Detection	THUMOS14 (test)	mAP41.3	93
Action Detection	THUMOS 2014 (test)	--	79
Activity Detection	Charades localize v1	mAP8.94	52
Action Recognition	THUMOS-14 (test)	mAP41.3	26
Activity Detection	MultiTHUMOS	mAP29.6	16
Action Recognition (Dense Labeling)	MultiTHUMOS (test)	mAP29.7	15
Action Recognition (Dense Labeling)	THUMOS (test)	mAP41.3	7

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord