Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

About

Transformer-based models have improved visual tracking, but most still cannot run in real time on resource-limited devices, especially for unmanned aerial vehicle (UAV) tracking. To achieve a better balance between performance and efficiency, we propose AVTrack, an adaptive computation tracking framework that adaptively activates transformer blocks through an Activation Module (AM), which dynamically optimizes the ViT architecture by selectively engaging relevant components. To address extreme viewpoint variations, we propose to learn view-invariant representations via mutual information (MI) maximization. In addition, we propose AVTrack-MD, an enhanced tracker incorporating a novel MI maximization-based multi-teacher knowledge distillation framework. Leveraging multiple off-the-shelf AVTrack models as teachers, we maximize the MI between their aggregated softened features and the corresponding softened feature of the student model, improving the generalization and performance of the student, especially under noisy conditions. Extensive experiments show that AVTrack-MD achieves performance comparable to AVTrack's performance while reducing model complexity and boosting average tracking speed by over 17\%. Codes is available at: https://github.com/wuyou3474/AVTrack.

You Wu, Yongxin Li, Mengyuan Liu, Xucheng Wang, Xiangyang Yang, Hengzhou Ye, Dan Zeng, Qijun Zhao, Shuiwang Li• 2024

Related benchmarks

Task	Dataset	Result
Visual Object Tracking	UAV123 (test)	--	188
Visual Tracking	UAV123	--	56
UAV Tracking	VisDrone 2018	Precision86	55
Visual Object Tracking	UAV123	SUC66.8	48
UAV Tracking	DTB70	Precision0.843	32
UAV Tracking	UAVDT	Precision82.1	32
Visual Object Tracking	UAVDT	Precision83.1	23
Visual Object Tracking	DTB70	Precision84.3	23
Visual Object Tracking	UAV123 10fps	Precision83.3	23
Visual Object Tracking	DTB70 (test)	AUC65	19

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord