Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

About

Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue, there exist data representational differences that require additional effort to resolve. In this work, for the first time, we synergize information from image, text, and event-data domains and introduce OpenESS to enable scalable ESS in an open-world, annotation-efficient manner. We achieve this goal by transferring the semantically rich CLIP knowledge from image-text pairs to event streams. To pursue better cross-modality adaptation, we propose a frame-to-event contrastive distillation and a text-to-event semantic consistency regularization. Experimental results on popular ESS benchmarks showed our approach outperforms existing methods. Notably, we achieve 53.93% and 43.31% mIoU on DDD17 and DSEC-Semantic without using either event or frame labels.

Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationDDD17
mIoU63
50
Semantic segmentationDDD17 (test)
mIoU63
46
Semantic segmentationDSEC (test)
mIoU57.21
34
Semantic segmentationDDD17-Seg v1 (test)
mIoU63
24
Semantic segmentationDSEC-Semantic v1 (test)
mIoU57.21
24
Semantic segmentationDSEC-Semantic
mIoU57.21
20
Event-based Semantic SegmentationDDD17 (test)
mIoU (General)63
19
Event-based Semantic SegmentationDSEC-Semantic (test)
Accuracy90.21
14
Showing 8 of 8 rows

Other info

Code

Follow for update