Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Spotting Temporally Precise, Fine-Grained Events in Video

About

We introduce the task of spotting temporally precise, fine-grained events in video (detecting the precise moment in time events occur). Precise spotting requires models to reason globally about the full-time scale of actions and locally to identify subtle frame-to-frame appearance and motion differences that identify events during these actions. Surprisingly, we find that top performing solutions to prior video understanding tasks such as action detection and segmentation do not simultaneously meet both requirements. In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. We demonstrate that E2E-Spot significantly outperforms recent baselines adapted from the video action detection, segmentation, and spotting literature to the precise spotting task. Finally, we contribute new annotations and splits to several fine-grained sports action datasets to make these datasets suitable for future work on precise spotting.

James Hong, Haotian Zhang, Micha\"el Gharbi, Matthew Fisher, Kayvon Fatahalian• 2022

Related benchmarks

TaskDatasetResultRank
Event SpottingFineGYM
mAP67.23
23
Event SpottingComp FS
mAP94.9
23
Event SpottingFS-Perf
mAP0.96
23
Action spottingSoccerNet v2 (test)
Average-mAP (Tight 1-5 s)61.82
23
Event SpottingTennis
mAP (delta=1)96.9
15
Action spottingSoccerNet v2 (challenge)
Average-mAP (Tight 1-5s)66.73
14
Video Event SpottingFS-Perf
mAP (@ delta=0)40.5
11
Video Event SpottingFS-Comp
mAP (@ delta=0)37.6
11
Video Event SpottingTennis
mAP (@ delta=0)71.6
8
Video Event SpottingFineDiving
mAP (@ delta=0)30.2
8
Showing 10 of 17 rows

Other info

Follow for update