Event Transformer. A sparse-aware solution for efficient event data processing

About

Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms. Efforts toward efficient solutions usually do not achieve top-accuracy results for complex tasks. This work proposes a novel framework, Event Transformer (EvT), that effectively takes advantage of event-data properties to be highly efficient and accurate. We introduce a new patch-based event representation and a compact transformer-like architecture to process it. EvT is evaluated on different event-based benchmarks for action and gesture recognition. Evaluation results show better or comparable accuracy to the state-of-the-art while requiring significantly less computation resources, which makes EvT able to work with minimal latency both on GPU and CPU.

Alberto Sabater, Luis Montesano, Ana C. Murillo• 2022

Related benchmarks

Task	Dataset	Result
Gesture Recognition	DVS128-Gesture (test)	Accuracy96.2	30
Action Recognition	DVS128Gesture	Accuracy94.4	18
Action Recognition	SL-Animals 4Sets	Accuracy88.12	15
Action Recognition	SL-Animals 3Sets	Accuracy87.45	13
Action Recognition	DVSGesture (full)	Accuracy96.2	11
Classification	SL-Animals-DVS 47 (all subsets)	Accuracy88.12	8
Event-based action recognition	DVS128 Gesture	Top-1 Acc96.2	8
Event-based action recognition	SeAct	Top-1 Acc61.3	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord