Real-time Object Detection for Streaming Perception
About
Autonomous driving requires the model to perceive the environment and (re)act within a low latency for safety. While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem. We build a simple and effective framework for streaming perception. It equips a novel DualFlow Perception module (DFP), which includes dynamic and static flows to capture the moving trend and basic detection feature for streaming prediction. Further, we introduce a Trend-Aware Loss (TAL) combined with a trend factor to generate adaptive weights for objects with different moving speeds. Our simple method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline, validating its effectiveness. Our code will be made available at https://github.com/yancie-yjr/StreamYOLO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Streaming Perception | Argoverse-HD v1.0 (test) | sAP42.3 | 10 | |
| Streaming Perception | Argoverse-HD v1.1 (test) | sAP36.7 | 9 | |
| Streaming 3D Object Detection | KITTI Tracking Car v1 (test) | sAP BEV IoU=0.5 Easy92.22 | 4 | |
| Streaming 3D Object Detection | KITTI Tracking Pedestrian v1 (test) | sAP BEV IoU=0.5 Easy74.54 | 4 | |
| Streaming Object Detection | Argoverse-HD | sAP29.6 | 4 | |
| Streaming 3D Object Detection | KITTI Tracking Cyclist v1 (test) | sAP BEV (IoU=0.5) Easy39.34 | 4 |