Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Event Camera Data Pre-training

About

This paper proposes a pre-trained neural network for handling event camera data. Our model is a self-supervised learning framework, and uses paired event camera data and natural RGB images for training. Our method contains three modules connected in a sequence: i) a family of event data augmentations, generating meaningful event images for self-supervised training; ii) a conditional masking strategy to sample informative event patches from event images, encouraging our model to capture the spatial layout of a scene and accelerating training; iii) a contrastive learning approach, enforcing the similarity of embeddings between matching event images, and between paired event and RGB images. An embedding projection loss is proposed to avoid the model collapse when enforcing the event image embedding similarities. A probability distribution alignment loss is proposed to encourage the event image to be consistent with its paired RGB image in the feature space. Transfer learning performance on downstream tasks shows the superiority of our method over state-of-the-art methods. For example, we achieve top-1 accuracy at 64.83% on the N-ImageNet dataset.

Yan Yang, Liyuan Pan, Liu Liu• 2023

Related benchmarks

TaskDatasetResultRank
Optical FlowMVSEC 1.0 (indoor_flying1)
EPE0.6
52
Semantic segmentationDDD17
mIoU59.15
50
Optical FlowMVSEC 1.0 (indoor_flying2)
EPE1.26
46
Optical FlowMVSEC 1.0 (indoor_flying3)
EPE1
46
Semantic segmentationDDD17 (test)
mIoU54.66
46
Semantic segmentationDSEC (test)
mIoU52.52
34
Semantic segmentationDSEC-Semantic
mIoU59.16
20
Monocular Depth EstimationDSEC-Depth
RMSE11.473
20
Monocular Depth EstimationMVSEC Depth
RMSE7.68
20
object recognitionN-ImageNet 1.0 (test)
Top-1 Accuracy64.83
13
Showing 10 of 20 rows

Other info

Follow for update