When will you do what? - Anticipating Temporal Occurrences of Activities

About

Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amount of future actions and their durations. Both, a CNN and an RNN are trained to learn future video labels based on previously seen content. We show that our methods generate accurate predictions of the future even for long videos with a huge amount of different actions and can even deal with noisy or erroneous input information.

Yazan Abu Farha, Alexander Richard, Juergen Gall• 2018

Related benchmarks

Task	Dataset	Result
Action Anticipation	Breakfast	MoC Accuracy22.44	64
Action Anticipation	DARai (Coarse)	MoC Accuracy30.75	64
Long-term Action Anticipation	50 Salads	MoC Accuracy30.77	56
Action Anticipation	UTKinects	MoC Accuracy25	56
Action Anticipation	NTURGBD	MoC Accuracy16.74	56
Action Anticipation	DARai Fine-grained	MoC Accuracy0.0907	56
Dense anticipation mean over classes	Breakfast (test)	Mean Error @ 10% Horizon12.8	28
Dense anticipation mean over classes	50Salads (test)	Mean Error (10%)25.5	22
Next Action Anticipation	Breakfast (test)	Accuracy30.1	11
Long-term Action Anticipation	50 Salads (test)	MoC (alpha=0.2, beta=0.1)30.06	10

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord