Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Search Multilayer Perceptron-Based Fusion for Efficient and Accurate Siamese Tracking

About

Siamese visual trackers have recently advanced through increasingly sophisticated fusion mechanisms built on convolutional or Transformer architectures. However, both struggle to deliver pixel-level interactions efficiently on resource-constrained hardware, leading to a persistent accuracy-efficiency imbalance. Motivated by this limitation, we redesign the Siamese neck with a simple yet effective Multilayer Perception (MLP)-based fusion module that enables pixel-level interaction with minimal structural overhead. Nevertheless, naively stacking MLP blocks introduces a new challenge: computational cost can scale quadratically with channel width. To overcome this, we construct a hierarchical search space of carefully designed MLP modules and introduce a customized relaxation strategy that enables differentiable neural architecture search (DNAS) to decouple channel-width optimization from other architectural choices. This targeted decoupling automatically balances channel width and depth, yielding a low-complexity architecture. The resulting tracker achieves state-of-the-art accuracy-efficiency trade-offs. It ranks among the top performers on four general-purpose and three aerial tracking benchmarks, while maintaining real-time performance on both resource-constrained Graphics Processing Units (GPUs) and Neural Processing Units (NPUs).

Tianqi Shen, Huakao Lin, Ning An• 2026

Related benchmarks

TaskDatasetResultRank
Visual Object TrackingGOT-10k (test)--
408
Visual Object TrackingOTB 2015
AUC69.7
63
Object TrackingNFS 30
AUC61.1
7
Object TrackingVOT 2019
EAO34
4
Showing 4 of 4 rows

Other info

Follow for update