Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning from Video and Text via Large-Scale Discriminative Clustering

About

Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks. Such applications include person and action recognition, text-to-video alignment, object co-segmentation and colocalization in videos and images. One drawback of discriminative clustering, however, is its limited scalability. We address this issue and propose an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm. We apply the proposed method to the problem of weakly supervised learning of actions and actors from movies together with corresponding movie scripts. The scaling up of the learning problem to 66 feature length movies enables us to significantly improve weakly supervised action recognition.

Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic• 2017

Related benchmarks

TaskDatasetResultRank
Text-to-Video RetrievalLSMDC (test)
R@1730
225
Movie RetrievalLSMDC 17 (public test)
Recall@17.3
16
Movie RetrievalLSMDC 2016 (test)
R@17.3
13
Video-to-Text retrievalLSMDC (test)
MC Accuracy69.7
8
Person RecognitionCasablanca benchmark 4 (full)
Accuracy83
5
Person RecognitionBuffy (Season 5)
AP (Episode 1)0.98
3
Showing 6 of 6 rows

Other info

Follow for update