Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Temporal Saliency Query Network for Efficient Video Recognition

About

Efficient video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices. Most existing methods select the salient frames without awareness of the class-specific saliency scores, which neglect the implicit association between the saliency of frames and its belonging category. To alleviate this issue, we devise a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement. Specifically, we model the class-specific saliency measuring process as a query-response task. For each category, the common pattern of it is employed as a query and the most salient frames are responded to it. Then, the calculated similarities are adopted as the frame saliency scores. To achieve it, we propose a Temporal Saliency Query Network (TSQNet) that includes two instantiations of the TSQ mechanism based on visual appearance similarities and textual event-object relations. Afterward, cross-modality interactions are imposed to promote the information exchange between them. Finally, we use the class-specific saliencies of the most confident categories generated by two modalities to perform the selection of salient frames. Extensive experiments demonstrate the effectiveness of our method by achieving state-of-the-art results on ActivityNet, FCVID and Mini-Kinetics datasets. Our project page is at https://lawrencexia2008.github.io/projects/tsqnet .

Boyang Xia, Zhihao Wang, Wenhao Wu, Haoran Wang, Jungong Han• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionActivityNet (test)
mAP93.7
38
Fine-grained Video CategorizationActivityNet v1.3 (val)
mAP76.6
32
Action RecognitionActivityNet v1.3
mAP93.7
31
Video RecognitionFCVID (test)
mAP83.5
28
Action RecognitionActivityNet v1.3 (test)
mAP93.7
19
Video RecognitionKinetics Mini
Top-1 Acc73.2
18
Action RecognitionActivityNet 1.3 (val)
Top-1 Accuracy88.7
7
Action RecognitionActivityNet
mAP93.7
5
Showing 8 of 8 rows

Other info

Code

Follow for update