Object Tracking by Jointly Exploiting Frame and Event Domain
About
Inspired by the complementarity between conventional frame-based and bio-inspired event-based cameras, we propose a multi-modal based approach to fuse visual cues from the frame- and event-domain to enhance the single object tracking performance, especially in degraded conditions (e.g., scenes with high dynamic range, low light, and fast-motion objects). The proposed approach can effectively and adaptively combine meaningful information from both domains. Our approach's effectiveness is enforced by a novel designed cross-domain attention schemes, which can effectively enhance features based on self- and cross-domain attention schemes; The adaptiveness is guarded by a specially designed weighting scheme, which can adaptively balance the contribution of the two domains. To exploit event-based visual cues in single-object tracking, we construct a large-scale frame-event-based dataset, which we subsequently employ to train a novel frame-event fusion based model. Extensive experiments show that the proposed approach outperforms state-of-the-art frame-based tracking methods by at least 10.4% and 11.9% in terms of representative success rate and precision rate, respectively. Besides, the effectiveness of each key component of our approach is evidenced by our thorough ablation study.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Single Object Tracking | FE108 (test) | RSR63.4 | 28 | |
| Visual Tracking | FE108 | SR63.4 | 25 | |
| Object Tracking | VisEvent 51 (test) | Rigid RSR51 | 16 | |
| Object Tracking | FE240hz 61 (test) | RSR (HDR)53.1 | 16 | |
| Single Object Tracking | FE108 HDR | RSR59.9 | 12 | |
| Single Object Tracking | FE108 LL | RSR0.656 | 12 | |
| Single Object Tracking | FE108 FWB | RSR71.2 | 12 | |
| Single Object Tracking | FE108 FNB | RSR62.8 | 12 | |
| Single Object Tracking | FE108 ALL | RSR63.4 | 12 | |
| Object Tracking | FE108 (test) | PR92.4 | 11 |