OadTR: Online Action Detection with Transformers

About

Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure. However, RNN suffers from non-parallelism and gradient vanishing, hence it is hard to be optimized. In this paper, we propose a new encoder-decoder framework based on Transformers, named OadTR, to tackle these problems. The encoder attached with a task token aims to capture the relationships and global interactions between historical observations. The decoder extracts auxiliary information by aggregating anticipated future clip representations. Therefore, OadTR can recognize current actions by encoding historical information and predicting future context simultaneously. We extensively evaluate the proposed OadTR on three challenging datasets: HDD, TVSeries, and THUMOS14. The experimental results show that OadTR achieves higher training and inference speeds than current RNN based approaches, and significantly outperforms the state-of-the-art methods in terms of both mAP and mcAP. Code is available at https://github.com/wangxiang1230/OadTR.

Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, Nong Sang• 2021

Related benchmarks

Task	Dataset	Result
Online Action Detection	THUMOS14 (test)	mAP65.5	93
Online Action Detection	TVSeries	mcAP87.2	71
Online Action Detection	TVSeries (test)	mcAP87.2	41
Online Action Detection	THUMOS 14	Mean F-AP58.3	37
Online Action Detection	HDD	Overall mAP29.8	29
Action Anticipation	TVSeries (test)	mcAP77.8	22
Action Anticipation	THUMOS-14 (test)	mAP45.9	14
Action Anticipation	THUMOS 2014	mAP (Avg)53.5	14
Action Anticipation	THUMOS 14	Accuracy @ 0.25s59.8	8
Procedural mistake detection	Assembly101-O	Precision24.3	8

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord