TrouSPI-Net: Spatio-temporal attention on parallel atrous convolutions and U-GRUs for skeletal pedestrian crossing prediction

About

Understanding the behaviors and intentions of pedestrians is still one of the main challenges for vehicle autonomy, as accurate predictions of their intentions can guarantee their safety and driving comfort of vehicles. In this paper, we address pedestrian crossing prediction in urban traffic environments by linking the dynamics of a pedestrian's skeleton to a binary crossing intention. We introduce TrouSPI-Net: a context-free, lightweight, multi-branch predictor. TrouSPI-Net extracts spatio-temporal features for different time resolutions by encoding pseudo-images sequences of skeletal joints' positions and processes them with parallel attention modules and atrous convolutions. The proposed approach is then enhanced by processing features such as relative distances of skeletal joints, bounding box positions, or ego-vehicle speed with U-GRUs. Using the newly proposed evaluation procedures for two large public naturalistic data sets for studying pedestrian behavior in traffic: JAAD and PIE, we evaluate TrouSPI-Net and analyze its performance. Experimental results show that TrouSPI-Net achieved 0.76 F1 score on JAAD and 0.80 F1 score on PIE, therefore outperforming current state-of-the-art while being lightweight and context-free.

Joseph Gesnouin, Steve Pechberti, Bogdan Stanciulescu, Fabien Moutarde• 2021

Related benchmarks

Task	Dataset	Result
Pedestrian Intention Prediction	JAAD (All)	Accuracy85	49
Pedestrian Intention Prediction	JAAD Beh	Accuracy64	19
Pedestrian crossing intention prediction	PIE set03 (test)	Accuracy88	16
Pedestrian crossing intention prediction	JAADbeh (test)	Accuracy64	15
Pedestrian Intention Prediction	PIE	Accuracy88	11

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord