Learning Multi-Domain Convolutional Neural Networks for Visual Tracking

About

We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of domain-specific layers, where domains correspond to individual training sequences and each branch is responsible for binary classification to identify the target in each domain. We train the network with respect to each domain iteratively to obtain generic target representations in the shared layers. When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer, which is updated online. Online tracking is performed by evaluating the candidate windows randomly sampled around the previous target state. The proposed algorithm illustrates outstanding performance compared with state-of-the-art methods in existing tracking benchmarks.

Hyeonseob Nam, Bohyung Han• 2015

Related benchmarks

Task	Dataset	Result
Visual Object Tracking	TrackingNet (test)	Normalized Precision (Pnorm)70.5	502
Object Tracking	LaSoT	AUC39.7	498
Visual Object Tracking	LaSOT (test)	AUC39.7	470
Visual Object Tracking	GOT-10k (test)	Average Overlap29.9	450
Object Tracking	TrackingNet	Precision (P)61.8	327
Visual Object Tracking	GOT-10k	AO29.9	306
RGB-T Tracking	RGBT234 (test)	Precision Rate72.2	203
Visual Object Tracking	UAV123	AUC0.528	193
Visual Object Tracking	UAV123 (test)	AUC53.2	188
Visual Object Tracking	TNL2K	--	169

Showing 10 of 60 rows

Other info

Follow for update

@wizwand_team Discord