RNN Fisher Vectors for Action Recognition and Image Annotation

About

Recurrent Neural Networks (RNNs) have had considerable success in classifying and predicting sequences. We demonstrate that RNNs can be effectively used in order to encode sequences and provide effective representations. The methodology we use is based on Fisher Vectors, where the RNNs are the generative probabilistic models and the partial derivatives are computed using backpropagation. State of the art results are obtained in two central but distant tasks, which both rely on sequences: video action recognition and image annotation. We also show a surprising transfer learning result from the task of image annotation to the task of video action recognition.

Guy Lev, Gil Sadeh, Benjamin Klein, Lior Wolf• 2015

Related benchmarks

Task	Dataset	Result
Text-to-Image Retrieval	Flickr30k (test)	Recall@127.4	528
Image-to-Text Retrieval	Flickr30k (test)	R@135.6	472
Action Recognition	UCF101 (mean of 3 splits)	Accuracy88	357
Image Retrieval	Flickr30k (test)	R@126.2	357
Action Recognition	HMDB51	3-Fold Accuracy54.3	191
Image Retrieval	Flickr30K	R@127.4	170
Text-to-Image Retrieval	MSCOCO (1K test)	R@129.6	118
Image-to-Text Retrieval	MSCOCO (1K test)	R@141.5	96
Image Search	Flickr8K	R@123.2	74
Image Annotation	Flickr30k (test)	R@134.7	39

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord