Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

About

We address the problem of cross-modal fine-grained action retrieval between text and video. Cross-modal retrieval is commonly achieved through learning a shared embedding space, that can indifferently embed modalities. In this paper, we propose to enrich the embedding by disentangling parts-of-speech (PoS) in the accompanying captions. We build a separate multi-modal embedding space for each PoS tag. The outputs of multiple PoS embeddings are then used as input to an integrated multi-modal space, where we perform action retrieval. All embeddings are trained jointly through a combination of PoS-aware and PoS-agnostic losses. Our proposal enables learning specialised embedding spaces that offer multiple views of the same embedded entities. We report the first retrieval results on fine-grained actions for the large-scale EPIC dataset, in a generalised zero-shot setting. Results show the advantage of our approach for both video-to-text and text-to-video action retrieval. We also demonstrate the benefit of disentangling the PoS for the generic task of cross-modal video retrieval on the MSR-VTT dataset.

Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen• 2019

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy64.82
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy33.9
430
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy27.5
336
Action RecognitionNTU-60 (xsub)
Accuracy65
223
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy46.7
211
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy52.8
184
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy57.3
143
Action RecognitionNTU-60 48/12 split
Top-1 Acc60.5
103
Action RecognitionNTU-120 96/24 split
Top-1 Acc35.7
84
Action RecognitionNTU RGB+D 120 (110/10 Xsub)
Accuracy49.9
66
Showing 10 of 93 rows
...

Other info

Follow for update