Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

About

Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments. To address these challenges, we propose a novel Positional Recovery Training framework (Port), which prompts the model with the start and end times of specific animal behaviors during training. Specifically, \port{} enhances the baseline model with a Recovering branch to reconstruct corrupted label sequences and align distributions via a Dual-alignment method. This allows the model to focus on specific temporal regions prompted by ground-truth information. Extensive experiments on the Animal Kingdom dataset demonstrate the effectiveness of \port{}, achieving an IoU@0.3 of 38.52. It emerges as one of the top performers in the sub-track of MMVRAC in ICME 2024 Grand Challenges.

Sheng Yan, Xin Du, Zongying Li, Yi Wang, Hongcang Jin, Mengyuan Liu• 2024

Related benchmarks

Task	Dataset	Result	Rank
Temporal Video Grounding	Animal Kingdom (test)	IoU@0.338.52		3

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord