Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

About

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022. Given a video clip and a text query, the goal of this challenge is to locate a temporal moment of the video clip where the answer to the query can be obtained. To tackle this task, we propose a multi-scale cross-modal transformer and a video frame-level contrastive loss to fully uncover the correlation between language queries and video clips. Besides, we propose two data augmentation strategies to increase the diversity of training samples. The experimental results demonstrate the effectiveness of our method. The final submission ranked first on the leaderboard.

Naiyuan Liu, Xiaohan Wang, Xiaobo Li, Yi Yang, Yueting Zhuang• 2022

Related benchmarks

TaskDatasetResultRank
Natural Language QueriesEgo4D NLQ (val)
Recall@1 (IoU=0.3)0.1079
23
Natural Language QueriesEgo4D NLQ (test)
R@1 (IoU=0.3)12.89
21
Temporal GroundingEgo4D 1.0 (test)
Recall@1 (IoU=0.3)12.89
7
Showing 3 of 3 rows

Other info

Code

Follow for update