Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RNN Fisher Vectors for Action Recognition and Image Annotation

About

Recurrent Neural Networks (RNNs) have had considerable success in classifying and predicting sequences. We demonstrate that RNNs can be effectively used in order to encode sequences and provide effective representations. The methodology we use is based on Fisher Vectors, where the RNNs are the generative probabilistic models and the partial derivatives are computed using backpropagation. State of the art results are obtained in two central but distant tasks, which both rely on sequences: video action recognition and image annotation. We also show a surprising transfer learning result from the task of image annotation to the task of video action recognition.

Guy Lev, Gil Sadeh, Benjamin Klein, Lior Wolf• 2015

Related benchmarks

TaskDatasetResultRank
Text-to-Image RetrievalFlickr30k (test)
Recall@127.4
423
Image-to-Text RetrievalFlickr30k (test)
R@135.6
370
Action RecognitionUCF101 (mean of 3 splits)
Accuracy88
357
Image RetrievalFlickr30k (test)
R@126.2
195
Action RecognitionHMDB51
3-Fold Accuracy54.3
191
Image RetrievalFlickr30K
R@127.4
144
Text-to-Image RetrievalMSCOCO (1K test)
R@129.6
104
Image-to-Text RetrievalMSCOCO (1K test)
R@141.5
82
Image SearchFlickr8K
R@123.2
74
Image AnnotationFlickr30k (test)
R@134.7
39
Showing 10 of 15 rows

Other info

Follow for update