End-to-End Learning of Representations for Asynchronous Event-Based Data
About
Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregate events into a grid-based representation and subsequently process it by a standard vision pipeline, e.g., Convolutional Neural Network (CNN). In this work, we introduce a general framework to convert event streams into grid-based representations through a sequence of differentiable operations. Our framework comes with two main advantages: (i) allows learning the input event representation together with the task dedicated network in an end to end manner, and (ii) lays out a taxonomy that unifies the majority of extant event representations in the literature and identifies novel ones. Empirically, we show that our approach to learning the event representation end-to-end yields an improvement of approximately 12% on optical flow estimation and object recognition over state-of-the-art methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR10-DVS (test) | Accuracy74.9 | 80 | |
| Image Classification | N-MNIST (test) | Accuracy99.1 | 69 | |
| Object Classification | N-CARS (test) | Accuracy92.5 | 53 | |
| Object Classification | N-Caltech101 (test) | Accuracy83.7 | 51 | |
| Optic Flow Estimation | MVSEC (indoor_flying2) | AEE1.38 | 51 | |
| Optic Flow Estimation | MVSEC (indoor_flying3) | AEE1.4 | 51 | |
| Optical Flow | MVSEC 1.0 (indoor_flying1) | EPE0.96 | 43 | |
| Optical Flow | MVSEC 1.0 (indoor_flying2) | EPE1.38 | 37 | |
| Optical Flow | MVSEC 1.0 (indoor_flying3) | EPE1.4 | 37 | |
| Event-based action recognition | HARDVS | Top-1 Acc36.51 | 22 |