End-to-End Learning of Representations for Asynchronous Event-Based Data

About

Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.

Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, Davide Scaramuzza• 2019

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR10-DVS (test)	Accuracy74.9	101
Image Classification	N-MNIST (test)	Accuracy99.1	80
Optic Flow Estimation	MVSEC (indoor_flying2)	AEE1.38	56
Object Classification	N-CARS (test)	Accuracy92.5	53
Optical Flow	MVSEC 1.0 (indoor_flying1)	EPE0.96	52
Object Classification	N-Caltech101 (test)	Accuracy83.7	51
Optic Flow Estimation	MVSEC (indoor_flying3)	AEE1.4	51
Optical Flow	MVSEC 1.0 (indoor_flying2)	EPE1.38	46
Optical Flow	MVSEC 1.0 (indoor_flying3)	EPE1.4	46
Action Recognition	DailyDVS-200	Top-1 Acc32.23	29

Showing 10 of 25 rows

Other info

Code

Follow for update

@wizwand_team Discord