Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Action Detection via an Image Diffusion Process

About

Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images via our proposed Action Detection Image Diffusion (ADI-Diff) framework. Furthermore, since our images differ from natural images and exhibit special properties, we further explore a Discrete Action-Detection Diffusion Process and a Row-Column Transformer design to better handle their processing. Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.

Lin Geng Foo, Tianjiao Li, Hossein Rahmani, Jun Liu• 2024

Related benchmarks

TaskDatasetResultRank
Temporal Action DetectionTHUMOS-14 (test)
mAP@tIoU=0.576.5
330
Temporal Action LocalizationTHUMOS14 (test)
AP @ IoU=0.576.5
319
Temporal Action LocalizationActivityNet 1.3 (val)
AP@0.556.9
257
Temporal Action DetectionActivityNet v1.3 (val)
mAP@0.556.9
185
Online Action DetectionTHUMOS14 (test)
mAP70.8
86
Action DetectionTHUMOS14 (test)
mAP@0.384.9
3
Showing 6 of 6 rows

Other info

Follow for update