From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning

About

Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-to-one correspondences, while Earth Mover Distance better reflects one-to-one transport at high computational cost. APML approximates transport with differentiable Sinkhorn iterations and an analytically derived temperature, but its dense formulation scales quadratically in memory. We present CUDA-APML, a sparse GPU implementation that thresholds negligible assignments and runs adaptive softmax, bidirectional symmetrization, and Sinkhorn normalization directly in COO form. This yields near-linear memory scaling and preserves gradients on the stored support, while pairwise distance evaluation remains quadratic in the current implementation. On ShapeNet and MM-Fi, CUDA-APML matches dense APML within a small tolerance while reducing peak GPU memory by 99.9%. Code available at: https://github.com/Multimodal-Sensing-Lab/apml

Sasan Sharifipour, Constantino \'Alvarez Casado, Manuel Lage Ca\~nellas, Miguel Bordallo L\'opez• 2025

Related benchmarks

Task	Dataset	Result	Rank
Point Cloud Completion	ShapeNet-34	F1 Score20		5
Point cloud generation	MM-Fi	EMD (x100)16.1		3

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord