Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TriDet: Temporal Action Detection with Relative Boundary Modeling

About

In this paper, we present a one-stage framework TriDet for temporal action detection. Existing methods often suffer from imprecise boundary predictions due to the ambiguous action boundaries in videos. To alleviate this problem, we propose a novel Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. In the feature pyramid of TriDet, we propose an efficient Scalable-Granularity Perception (SGP) layer to mitigate the rank loss problem of self-attention that takes place in the video features and aggregate information across different temporal granularities. Benefiting from the Trident-head and the SGP-based feature pyramid, TriDet achieves state-of-the-art performance on three challenging benchmarks: THUMOS14, HACS and EPIC-KITCHEN 100, with lower computational costs, compared to previous methods. For example, TriDet hits an average mAP of $69.3\%$ on THUMOS14, outperforming the previous best by $2.5\%$, but with only $74.6\%$ of its latency. The code is released to https://github.com/sssste/TriDet.

Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, Dacheng Tao• 2023

Related benchmarks

TaskDatasetResultRank
Temporal Action DetectionTHUMOS-14 (test)
mAP@tIoU=0.572.9
330
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.573.3
319
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.383.6
308
Temporal Action LocalizationActivityNet 1.3 (val)
AP@0.554.7
257
Temporal Action DetectionActivityNet v1.3 (val)
mAP@0.554.7
185
Temporal Action DetectionActivityNet 1.3
mAP@0.556.7
93
Temporal Action DetectionActivityNet 1.3 (test)
Average mAP36.8
80
Temporal Action DetectionTHUMOS 14
mAP@0.383.6
71
Temporal Action DetectionHACS segment (test)
mAP@0.562.4
30
Temporal Action DetectionTHUMOS14 (test)
mAP68.2
25
Showing 10 of 25 rows

Other info

Code

Follow for update