Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning

About

Self-supervised learning of convolutional neural networks can harness large amounts of cheap unlabeled data to train powerful feature representations. As surrogate task, we jointly address ordering of visual data in the spatial and temporal domain. The permutations of training samples, which are at the core of self-supervision by ordering, have so far been sampled randomly from a fixed preselected set. Based on deep reinforcement learning we propose a sampling policy that adapts to the state of the network, which is being trained. Therefore, new permutations are sampled according to their expected utility for updating the convolutional feature representation. Experimental evaluation on unsupervised and transfer learning tasks demonstrates competitive performance on standard benchmarks for image and video classification and nearest neighbor retrieval.

Uta B\"uchler, Biagio Brattoli, Bj\"orn Ommer• 2018

Related benchmarks

Task	Dataset	Result
Action Recognition	UCF101 (test)	Accuracy58.6	376
Action Recognition	UCF101 (mean of 3 splits)	Accuracy58.6	357
Action Recognition	HMDB51 (test)	Accuracy0.25	249
Action Recognition	HMDB-51 (average of three splits)	Top-1 Acc25	204
Action Classification	HMDB51 (over all three splits)	Accuracy25	121
Video Retrieval	UCF101 (1)	Top-1 Acc25.7	97
Video Retrieval	UCF101	Top-1 Acc25.7	63
Video Retrieval	UCF101 (test)	Top-1 Acc25.7	55

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord