Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learn to Match: Automatic Matching Network Design for Visual Tracking

About

Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants. Besides the remarkable success, it is important to note that the heuristic matching network design relies heavily on expert experience. Moreover, we experimentally find that one sole matching operator is difficult to guarantee stable tracking in all challenging environments. Thus, in this work, we introduce six novel matching operators from the perspective of feature fusion instead of explicit similarity learning, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection. The analyses reveal these operators' selective adaptability on different environment degradation types, which inspires us to combine them to explore complementary features. To this end, we propose binary channel manipulation (BCM) to search for the optimal combination of these operators. BCM determines to retrain or discard one operator by learning its contribution to other tracking steps. By inserting the learned matching networks to a strong baseline tracker Ocean, our model achieves favorable gains by $67.2 \rightarrow 71.4$, $52.6 \rightarrow 58.3$, $70.3 \rightarrow 76.0$ success on OTB100, LaSOT, and TrackingNet, respectively. Notably, Our tracker, dubbed AutoMatch, uses less than half of training data/time than the baseline tracker, and runs at 50 FPS using PyTorch. Code and model will be released at https://github.com/JudasDie/SOTS.

Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, Weiming Hu• 2021

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)82.4
463
Visual Object TrackingLaSOT (test)
AUC64.9
446
Object TrackingLaSoT
AUC58.3
411
Visual Object TrackingGOT-10k (test)
Average Overlap67.1
408
Object TrackingTrackingNet
Precision (P)72.6
270
Visual Object TrackingGOT-10k
AO65.2
254
Visual Object TrackingUAV123 (test)
AUC64.4
188
Visual Object TrackingOTB-100
AUC71.4
136
Visual Object TrackingTNL2K
AUC47.2
121
Visual Object TrackingLaSoText
AUC37.6
112
Showing 10 of 31 rows

Other info

Follow for update