Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiffCrossGait: Trajectory-Level Alignment for 2D-3D Cross-Modal Gait Recognition via Latent Diffusion

About

Cross-modal 2D-3D gait recognition is impeded by inherent domain discrepancies between 2D silhouette and 3D LiDAR range-view representations. While prior methods align only final embeddings, we propose DiffCrossGait, which reformulates cross-modal matching as trajectory-level alignment in an identity-relevant latent diffusion space, rather than assuming full equivalence between 2D and 3D observations. By driving both modalities with shared Gaussian noise within a latent space, we enable continuous alignment throughout the generative evolution. We introduce a Tri-Phase Alignment Strategy that exploits varying noise intensities to enforce identity anchoring, dynamics consistency, and cross-modal structural recoverability, thereby constraining both modalities to share denoising dynamics and bottleneck structure, which promotes modality-invariant gait features. Crucially, our framework decouples generative alignment from the discriminative backbone; the diffusion mechanism serves exclusively as a training objective, ensuring high inference efficiency by eliminating the computational overhead of iterative denoising. Extensive experiments on the SUSTech1K and FreeGait benchmarks demonstrate that DiffCrossGait achieves state-of-the-art performance.

Zhiyang Lu, Ming Cheng• 2026

Related benchmarks

TaskDatasetResultRank
Cross-modal Gait RecognitionFreeGait (test)
Rank-1 Accuracy61.5
28
Gait RecognitionSUSTech1K 3D LiDAR → 2D Camera
Rank-1 Accuracy (Overall)63.8
14
Cross-modal Gait RecognitionSUSTech 2D Camera → 3D LiDAR 1K (test)
Overall Rank-1 Acc58.7
10
Showing 3 of 3 rows

Other info

Follow for update