Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

About

Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.

Shichao Li, Peiliang Li, Qing Lian, Peng Yun, Xiaozhi Chen• 2025

Related benchmarks

Task	Dataset	Result	Rank
3D Multi-Object Tracking	nuScenes (val)	AMOTA72.2		157

Showing 1 of 1 rows

Other info

Code

Follow for update

@wizwand_team Discord