Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

About

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets. To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.

Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler• 2015

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	PIQA	Accuracy64.86	757
Common Sense Reasoning	COPA	Accuracy67.3	288
Commonsense Reasoning	OBQA	Accuracy57.6	187
Commonsense Reasoning	SocialIQA	Accuracy64.3	164
Commonsense Reasoning	CommonsenseQA (CSQA) v1.0 (test)	Accuracy53.08	46
Short Text Clustering	SearchSnippets	Accuracy33.58	38
Short Text Clustering	StackOverflow	Accuracy9.59	38
Commonsense Reasoning	aNLI	Accuracy61.88	35
Short Text Clustering	Biomedical	Accuracy0.1644	19
Short Text Clustering	Biomedical	NMI1.072	18

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord