Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

About

Cross-modal retrieval between videos and texts has attracted growing attentions due to the rapid emergence of videos on the web. The current dominant approach for this problem is to learn a joint embedding space to measure cross-modal similarities. However, simple joint embeddings are insufficient to represent complicated visual and textual details, such as scenes, objects, actions and their compositions. To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels. To be specific, the model disentangles texts into hierarchical semantic graph including three levels of events, actions, entities and relationships across levels. Attention-based graph reasoning is utilized to generate hierarchical textual embeddings, which can guide the learning of diverse and hierarchical video representations. The HGR model aggregates matchings from different video-text levels to capture both global and local details. Experimental results on three video-text datasets demonstrate the advantages of our model. Such hierarchical decomposition also enables better generalization across datasets and improves the ability to distinguish fine-grained semantic differences.

Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu• 2020

Related benchmarks

TaskDatasetResultRank
Text-to-Video RetrievalMSR-VTT
Recall@19.2
313
Text-to-Video RetrievalVATEX
R@135.1
95
Text-to-Video RetrievalYouCook2 (val)
R@1470
66
Text-to-Video RetrievalVATEX (test)
R@135.1
62
Video RetrievalActivityNet-Captions (test)
R@14
38
Partial Relevance Video RetrievalCharades-STA (test)
R@11.2
29
Partial Relevance Video RetrievalTVR (test)
R@11.7
25
Text-to-Video RetrievalMSR-VTT Official full-size (test)
R@111.1
24
Text-to-Video RetrievalMSR-VTT 1k-Yu (test)
R@121.7
18
Text-to-Video RetrievalMSR-VTT 1k-Miech (test)
R@122.9
17
Showing 10 of 19 rows

Other info

Follow for update