Faster-TAD: Towards Temporal Action Detection with Proposal Generation and Classification in a Unified Network
About
Temporal action detection (TAD) aims to detect the semantic labels and boundaries of action instances in untrimmed videos. Current mainstream approaches are multi-step solutions, which fall short in efficiency and flexibility. In this paper, we propose a unified network for TAD, termed Faster-TAD, by re-purposing a Faster-RCNN like architecture. To tackle the unique difficulty in TAD, we make important improvements over the original framework. We propose a new Context-Adaptive Proposal Module and an innovative Fake-Proposal Generation Block. What's more, we use atomic action features to improve the performance. Faster-TAD simplifies the pipeline of TAD and gets remarkable performance on lots of benchmarks, i.e., ActivityNet-1.3 (40.01% mAP), HACS Segments (38.39% mAP), SoccerNet-Action Spotting (54.09% mAP). It outperforms existing single-network detector by a large margin.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action spotting | SoccerNet v2 (test) | Average-mAP (Tight 1-5 s)54.1 | 23 | |
| Action spotting | SoccerNet v2 (challenge) | Average-mAP (Tight 1-5s)64.88 | 14 |