Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

When will you do what? - Anticipating Temporal Occurrences of Activities

About

Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amount of future actions and their durations. Both, a CNN and an RNN are trained to learn future video labels based on previously seen content. We show that our methods generate accurate predictions of the future even for long videos with a huge amount of different actions and can even deal with noisy or erroneous input information.

Yazan Abu Farha, Alexander Richard, Juergen Gall• 2018

Related benchmarks

TaskDatasetResultRank
Action AnticipationBreakfast
MoC Accuracy22.44
64
Action AnticipationDARai (Coarse)
MoC Accuracy30.75
64
Long-term Action Anticipation50 Salads
MoC Accuracy30.77
56
Action AnticipationUTKinects
MoC Accuracy25
56
Action AnticipationNTURGBD
MoC Accuracy16.74
56
Action AnticipationDARai Fine-grained
MoC Accuracy0.0907
56
Dense anticipation mean over classesBreakfast (test)
Mean Error @ 10% Horizon12.8
28
Dense anticipation mean over classes50Salads (test)
Mean Error (10%)25.5
22
Next Action AnticipationBreakfast (test)
Accuracy30.1
11
Long-term Action Anticipation50 Salads (test)
MoC (alpha=0.2, beta=0.1)30.06
10
Showing 10 of 14 rows

Other info

Follow for update