Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CARLOR @ Ego4D Step Grounding Challenge: Bayesian temporal-order priors for test time refinement

About

The goal of the Step Grounding task is to locate temporal boundaries of activities based on natural language descriptions. This technical report introduces a Bayesian-VSLNet to address the challenge of identifying such temporal segments in lengthy, untrimmed egocentric videos. Our model significantly improves upon traditional models by incorporating a novel Bayesian temporal-order prior during inference, enhancing the accuracy of moment predictions. This prior adjusts for cyclic and repetitive actions within videos. Our evaluations demonstrate superior performance over existing methods, achieving state-of-the-art results on the Ego4D Goal-Step dataset with a 35.18 Recall Top-1 at 0.3 IoU and 20.48 Recall Top-1 at 0.5 IoU on the test set.

Carlos Plou, Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Ana C.Murillo• 2024

Related benchmarks

TaskDatasetResultRank
Step GroundingEgo4D (test)
Recall@1 (IoU=0.3)35.18
7
Step GroundingEgo4D Goal-Step (val)
Recall@1 (IoU=0.3)18.15
4
Showing 2 of 2 rows

Other info

Follow for update