Event-Based Visual Teach-and-Repeat via Fast Fourier-Domain Cross-Correlation
About
Visual teach-and-repeat (VT&R) navigation enables robots to autonomously traverse previously demonstrated paths using visual feedback. We present a novel event-camera-based VT\&R system. Our system formulates event-stream matching as frequency-domain cross-correlation, transforming spatial convolutions into efficient Fourier-space multiplications. By exploiting the binary structure of event frames and applying image compression techniques, we achieve a processing latency of just 2.88 ms, about 3.5 times faster than conventional camera-based baselines that are optimised for runtime efficiency. Experiments using a Prophesee EVK4 HD event camera mounted on an AgileX Scout Mini robot demonstrate successful autonomous navigation across 3000+ meters of indoor and outdoor trajectories in daytime and nighttime conditions. Our system maintains Cross-Track Errors (XTE) below 15 cm, demonstrating the practical viability of event-based perception for real-time VT\&R navigation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Teach and Repeat Navigation | Track 5 Outdoor | Cross-Track Error (XTE)5.78 | 10 | |
| Visual Teach and Repeat Navigation | Track 6 Outdoor, Night-time | Cross-Track Error (XTE)5.22 | 10 | |
| Visual Teach and Repeat Navigation | Track 4 Outdoor | Cross-Track Error (XTE)5.27 | 10 | |
| Visual Teach and Repeat Navigation | Track 1 Indoor | Cross-Track Error (XTE)5.68 | 10 | |
| Visual Teach and Repeat Navigation | Track 2 Indoor | Cross-Track Error (XTE)7.68 | 10 | |
| Visual Teach and Repeat Navigation | Track 3 Indoor | Cross-Track Error (XTE)8.57 | 10 |