TriDet: Temporal Action Detection with Relative Boundary Modeling
About
In this paper, we present a one-stage framework TriDet for temporal action detection. Existing methods often suffer from imprecise boundary predictions due to the ambiguous action boundaries in videos. To alleviate this problem, we propose a novel Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. In the feature pyramid of TriDet, we propose an efficient Scalable-Granularity Perception (SGP) layer to mitigate the rank loss problem of self-attention that takes place in the video features and aggregate information across different temporal granularities. Benefiting from the Trident-head and the SGP-based feature pyramid, TriDet achieves state-of-the-art performance on three challenging benchmarks: THUMOS14, HACS and EPIC-KITCHEN 100, with lower computational costs, compared to previous methods. For example, TriDet hits an average mAP of $69.3\%$ on THUMOS14, outperforming the previous best by $2.5\%$, but with only $74.6\%$ of its latency. The code is released to https://github.com/sssste/TriDet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Temporal Action Detection | THUMOS-14 (test) | mAP@tIoU=0.572.9 | 330 | |
| Temporal Action Localization | THUMOS14 (test) | AP @ IoU=0.573.3 | 319 | |
| Temporal Action Localization | THUMOS-14 (test) | mAP@0.383.6 | 308 | |
| Temporal Action Localization | ActivityNet 1.3 (val) | AP@0.554.7 | 257 | |
| Temporal Action Detection | ActivityNet v1.3 (val) | mAP@0.554.7 | 185 | |
| Temporal Action Detection | ActivityNet 1.3 | mAP@0.556.7 | 93 | |
| Temporal Action Detection | ActivityNet 1.3 (test) | Average mAP36.8 | 80 | |
| Temporal Action Detection | THUMOS 14 | mAP@0.383.6 | 71 | |
| Temporal Action Detection | HACS segment (test) | mAP@0.562.4 | 30 | |
| Temporal Action Detection | THUMOS14 (test) | mAP68.2 | 25 |