Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RCL: Recurrent Continuous Localization for Temporal Action Detection

About

Temporal representation is the cornerstone of modern action detection techniques. State-of-the-art methods mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the temporal domain with a discretized grid, and then regress the accurate boundaries. In this paper, we revisit this foundational stage and introduce Recurrent Continuous Localization (RCL), which learns a fully continuous anchoring representation. Specifically, the proposed representation builds upon an explicit model conditioned with video embeddings and temporal coordinates, which ensure the capability of detecting segments with arbitrary length. To optimize the continuous representation, we develop an effective scale-invariant sampling strategy and recurrently refine the prediction in subsequent iterations. Our continuous anchoring scheme is fully differentiable, allowing to be seamlessly integrated into existing detectors, e.g., BMN and G-TAD. Extensive experiments on two benchmarks demonstrate that our continuous representation steadily surpasses other discretized counterparts by ~2% mAP. As a result, RCL achieves 52.92% mAP@0.5 on THUMOS14 and 37.65% mAP on ActivtiyNet v1.3, outperforming all existing single-model detectors.

Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan• 2022

Related benchmarks

TaskDatasetResultRank
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.552.9
319
Temporal Action LocalizationTHUMOS-14 (test)
mAP@0.370.1
308
Temporal Action LocalizationActivityNet 1.3 (val)
AP@0.555.15
257
Temporal Action LocalizationTHUMOS 2014
mAP@0.3070.1
93
Temporal Action DetectionActivityNet 1.3
mAP@0.551.7
93
Temporal Action DetectionTHUMOS 14
mAP@0.370.1
71
Showing 6 of 6 rows

Other info

Follow for update