Towards Distraction-Robust Active Visual Tracking
About
In active visual tracking, it is notoriously difficult when distracting objects appear, as distractors often mislead the tracker by occluding the target or bringing a confusing appearance. To address this issue, we propose a mixed cooperative-competitive multi-agent game, where a target and multiple distractors form a collaborative team to play against a tracker and make it fail to follow. Through learning in our game, diverse distracting behaviors of the distractors naturally emerge, thereby exposing the tracker's weakness, which helps enhance the distraction-robustness of the tracker. For effective learning, we then present a bunch of practical methods, including a reward function for distractors, a cross-modal teacher-student learning strategy, and a recurrent attention mechanism for the tracker. The experimental results show that our tracker performs desired distraction-robust active visual tracking and can be well generalized to unseen environments. We also show that the multi-agent game can be used to adversarially test the robustness of trackers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Active Tracking | UnrealCV Parking Lot scene | EL331 | 21 | |
| Embodied Visual Tracking | SimpleRoom Unseen Virtual Environment | EL500 | 16 | |
| Embodied Visual Tracking | UrbanCity Unseen Virtual Environment | EL496 | 16 | |
| Visual Active Tracking | UnrealCV Snow Village scene | EL424 | 11 | |
| Visual Active Tracking | UnrealCV | EL474 | 11 | |
| Visual Active Tracking | UnrealCV UrbanRoad scene | EL480 | 11 | |
| Visual Active Tracking | UnrealCV UrbanCity 4D | EL381 | 10 | |
| Visual Active Tracking | UnrealCV ComplexRoom 4D | EL401 | 10 | |
| Visual Active Tracking | UnrealCV Average - Distractor Environments | EL371 | 10 |