Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points

About

We propose a method for human activity recognition from RGB data that does not rely on any pose information during test time and does not explicitly calculate pose information internally. Instead, a visual attention module learns to predict glimpse sequences in each frame. These glimpses correspond to interest points in the scene that are relevant to the classified activities. No spatial coherence is forced on the glimpse locations, which gives the module liberty to explore different points at each frame and better optimize the process of scrutinizing visual information. Tracking and sequentially integrating this kind of unstructured data is a challenge, which we address by separating the set of glimpses from a set of recurrent tracking/recognition workers. These workers receive glimpses, jointly performing subsequent motion tracking and activity prediction. The glimpses are soft-assigned to the workers, optimizing coherence of the assignments in space, time and feature space using an external memory module. No hard decisions are taken, i.e. each glimpse point is assigned to all existing workers, albeit with different importance. Our methods outperform state-of-the-art methods on the largest human activity recognition dataset available to-date; NTU RGB+D Dataset, and on a smaller human action recognition dataset Northwestern-UCLA Multiview Action 3D Dataset. Our code is publicly available at https://github.com/fabienbaradel/glimpse_clouds.

Fabien Baradel, Christian Wolf, Julien Mille, Graham W. Taylor• 2018

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-View)
Accuracy93.2
652
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy93.2
588
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy86.6
500
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy86.6
467
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy86.6
336
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy83.52
222
Action RecognitionNTU 120 (Cross-Setup)
Accuracy83.84
203
Action RecognitionNorthwestern-UCLA (NUCLA) Multiview (cross-view)
Mean Accuracy90.1
45
Action RecognitionN-UCLA
Accuracy87.6
36
Action RecognitionUTD-MHAD (cross-subject)
Accuracy84.19
36
Showing 10 of 19 rows

Other info

Code

Follow for update