Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer

About

In this paper, we examine a key limitation in query-based detectors for temporal action detection (TAD), which arises from their direct adaptation of originally designed architectures for object detection. Despite the effectiveness of the existing models, they struggle to fully address the unique challenges of TAD, such as the redundancy in multi-scale features and the limited ability to capture sufficient temporal context. To address these issues, we propose a multi-dilated gated encoder and central-adjacent region integrated decoder for temporal action detection transformer (DiGIT). Our approach replaces the existing encoder that consists of multi-scale deformable attention and feedforward network with our multi-dilated gated encoder. Our proposed encoder reduces the redundant information caused by multi-level features while maintaining the ability to capture fine-grained and long-range temporal information. Furthermore, we introduce a central-adjacent region integrated decoder that leverages a more comprehensive sampling strategy for deformable cross-attention to capture the essential information. Extensive experiments demonstrate that DiGIT achieves state-of-the-art performance on THUMOS14, ActivityNet v1.3, and HACS-Segment. Code is available at: https://github.com/Dotori-HJ/DiGIT

Ho-Joong Kim, Yearang Lee, Jung-Ho Hong, Seong-Whan Lee• 2025

Related benchmarks

TaskDatasetResultRank
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.571.9
319
Temporal Action LocalizationActivityNet 1.3 (val)
AP@0.554.4
257
Temporal Action DetectionActivityNet 1.3
mAP@0.562
131
Temporal Action DetectionHACS segment (test)
mAP@0.562.4
30
Temporal Action DetectionTHUMOS 50% Seen / 50% Unseen 14
mAP@0.319.1
11
Temporal Action DetectionActivityNet v1.3 (50% Seen 50% Unseen)
mAP@0.5027.5
11
Temporal Action DetectionTHUMOS 75% Seen / 25% Unseen 14
mAP@0.329
11
Temporal Action DetectionActivityNet 75% Seen / 25% Unseen v1.3
mAP @ IoU=0.532.2
11
Temporal Forgery LocalizationActivityForensics Intra-Domain
AP@0.7578.61
4
Temporal Forgery LocalizationActivityForensics Open-World
AP@0.7588.99
4
Showing 10 of 12 rows

Other info

Code

Follow for update