Object Tracking by Jointly Exploiting Frame and Event Domain

About

Inspired by the complementarity between conventional frame-based and bio-inspired event-based cameras, we propose a multi-modal based approach to fuse visual cues from the frame- and event-domain to enhance the single object tracking performance, especially in degraded conditions (e.g., scenes with high dynamic range, low light, and fast-motion objects). The proposed approach can effectively and adaptively combine meaningful information from both domains. Our approach's effectiveness is enforced by a novel designed cross-domain attention schemes, which can effectively enhance features based on self- and cross-domain attention schemes; The adaptiveness is guarded by a specially designed weighting scheme, which can adaptively balance the contribution of the two domains. To exploit event-based visual cues in single-object tracking, we construct a large-scale frame-event-based dataset, which we subsequently employ to train a novel frame-event fusion based model. Extensive experiments show that the proposed approach outperforms state-of-the-art frame-based tracking methods by at least 10.4% and 11.9% in terms of representative success rate and precision rate, respectively. Besides, the effectiveness of each key component of our approach is evidenced by our thorough ablation study.

Jiqing Zhang, Xin Yang, Yingkai Fu, Xiaopeng Wei, Baocai Yin, Bo Dong• 2021

Related benchmarks

Task	Dataset	Result
Object Tracking	VisEvent	PR Score54.5	70
RGB-E Tracking	VisEvent	MPR70.4	46
RGB-Event Object Tracking	COESOT	Success Rate (SR)53.8	38
RGB-Event Object Tracking	FE108	Success Rate (SR)63.3	30
Single Object Tracking	FE108 (test)	RSR63.4	28
Visual Tracking	FE108	SR63.4	25
Visual Object Tracking	COESOT	SR50.3	18
Object Tracking	VisEvent 51 (test)	Rigid RSR51	16
Object Tracking	FE240hz 61 (test)	RSR (HDR)53.1	16
Visual Object Tracking	FE 240hz	Success Rate (SR)55.6	16

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord