GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

About

In autonomous driving, multi-agent collaborative perception enhances sensing capabilities by enabling agents to share perceptual data. A key challenge lies in handling {\em heterogeneous} features from agents equipped with different sensing modalities or model architectures, which complicates data fusion. Existing approaches often require retraining encoders or designing interpreter modules for pairwise feature alignment, but these solutions are not scalable in practice. To address this, we propose {\em GT-Space}, a flexible and scalable collaborative perception framework for heterogeneous agents. GT-Space constructs a common feature space from ground-truth labels, providing a unified reference for feature alignment. With this shared space, agents only need a single adapter module to project their features, eliminating the need for pairwise interactions with other agents. Furthermore, we design a fusion network trained with contrastive losses across diverse modality combinations. Extensive experiments on simulation datasets (OPV2V and V2XSet) and a real-world dataset (RCooper) demonstrate that GT-Space consistently outperforms baselines in detection accuracy while delivering robust performance. Our code will be released at https://github.com/KingScar/GT-Space.

Wentao Wang, Haoran Xu, Guang Tan• 2026

Related benchmarks

Task	Dataset	Result
3D Object Detection	OPV2V	AP@0.5089.1	146
3D Object Detection	V2XSet	AP@0.5087.4	70
Collaborative Perception	V2XSet (test)	AP@5087.3	32
Collaborative Perception	OPV2V (test)	AP@5089.4	32
3D Multi-Object Tracking	RCooper	AMOTA23.6	7
Object Detection	V2XSet	Performance Score 185.8	7
3D Object Detection	RCooper	AP@50 (A1)47.7	7
3D Object Detection	RCooper (test)	Base Score89.1	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord