Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
About
In frame-based vision, object detection faces substantial performance degradation under challenging conditions due to the limited sensing capability of conventional cameras. Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. However, effectively fusing two heterogeneous modalities remains an open issue. In this work, we propose a novel hierarchical feature refinement network for event-frame fusion. The core concept is the design of the coarse-to-fine fusion module, denoted as the cross-modality adaptive feature refinement (CAFR) module. In the initial phase, the bidirectional cross-modality interaction (BCI) part facilitates information bridging from two distinct sources. Subsequently, the features are further refined by aligning the channel-level mean and variance in the two-fold adaptive feature refinement (TAFR) part. We conducted extensive experiments on two benchmarks: the low-resolution PKU-DDD17-Car dataset and the high-resolution DSEC dataset. Experimental results show that our method surpasses the state-of-the-art by an impressive margin of $\textbf{8.0}\%$ on the DSEC dataset. Besides, our method exhibits significantly better robustness (\textbf{69.5}\% versus \textbf{38.7}\%) when introducing 15 different corruption types to the frame images. The code can be found at the link (https://github.com/HuCaoFighting/FRN).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | DSEC (test) | mAP (Car)49.9 | 29 | |
| Object Detection | PKU-DDD Car 17 | mAP5086.7 | 20 | |
| Object Detection | DSEC Corrupted 1.0 (test) | Average mPC5069.5 | 15 | |
| Object Detection | PKUDDD17-CAR All day (test) | mAP (0.5:0.95)46 | 14 | |
| Object Detection | PKUDDD17-CAR Day (test) | mAP (0.50:0.95)46.9 | 14 | |
| Object Detection | PKUDDD CAR Night 17 (test) | mAP (IoU 0.50:0.95)42.1 | 14 | |
| Steering Prediction | DDD 20 | RMSE0.0409 | 10 | |
| Steering Prediction | DRFuser (test) | RMSE0.2209 | 6 |