Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Timeception for Complex Action Recognition

About

This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued. We revisit the conventional definition of activity and restrict it to Complex Action: a set of one-actions with a weak temporal pattern that serves a specific purpose. Related works use spatiotemporal 3D convolutions with fixed kernel size, too rigid to capture the varieties in temporal extents of complex actions, and too short for long-range temporal modeling. In contrast, we use multi-scale temporal convolutions, and we reduce the complexity of 3D convolutions. The outcome is Timeception convolution layers, which reasons about minute-long temporal patterns, a factor of 8 longer than best related works. As a result, Timeception achieves impressive accuracy in recognizing the human activities of Charades, Breakfast Actions, and MultiTHUMOS. Further, we demonstrate that Timeception learns long-range temporal dependencies and tolerate temporal extents of complex actions.

Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders• 2018

Related benchmarks

TaskDatasetResultRank
Action RecognitionCharades
mAP0.411
64
Action RecognitionCharades (test)
mAP0.411
53
Action RecognitionCharades v1 (test)--
52
Action RecognitionBreakfast
Top-1 Accuracy71.3
28
Single-label activity classificationBreakfast
Accuracy71.3
21
Action RecognitionCharades v1 (val)
mAP41.1
15
Human Activity RecognitionBreakfast
Accuracy71.3
14
Long-form Video ClassificationBreakfast
Top-1 Accuracy71.3
14
Action RecognitionBreakfast (1357:335)
Accuracy86.9
13
Video UnderstandingBreakfast
Top-1 Acc71.3
12
Showing 10 of 13 rows

Other info

Code

Follow for update