STAMP: Scalable Task And Model-agnostic Collaborative Perception
About
Perception is crucial for autonomous driving, but single-agent perception is often constrained by sensors' physical limitations, leading to degraded performance under severe occlusion, adverse weather conditions, and when detecting distant objects. Multi-agent collaborative perception offers a solution, yet challenges arise when integrating heterogeneous agents with varying model architectures. To address these challenges, we propose STAMP, a scalable task- and model-agnostic, collaborative perception pipeline for heterogeneous agents. STAMP utilizes lightweight adapter-reverter pairs to transform Bird's Eye View (BEV) features between agent-specific and shared protocol domains, enabling efficient feature sharing and fusion. This approach minimizes computational overhead, enhances scalability, and preserves model security. Experiments on simulated and real-world datasets demonstrate STAMP's comparable or superior accuracy to state-of-the-art models with significantly reduced computational costs. As a first-of-its-kind task- and model-agnostic framework, STAMP aims to advance research in scalable and secure mobility systems towards Level 5 autonomy. Our project page is at https://xiangbogaobarry.github.io/STAMP and the code is available at https://github.com/taco-group/STAMP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Object Detection | OPV2V | AP@0.5087.6 | 146 | |
| 3D Object Detection | V2XSet | AP@0.5085.8 | 70 | |
| Collaborative Perception | OPV2V (test) | AP@5088.6 | 32 | |
| Collaborative Perception | V2XSet (test) | AP@5085.4 | 32 | |
| 3D Multi-Object Tracking | RCooper | AMOTA23.1 | 7 | |
| 3D Object Detection | RCooper | AP@50 (A1)47.3 | 7 | |
| Object Detection | V2XSet | Performance Score 184.2 | 7 | |
| 3D Object Detection | RCooper (test) | Base Score87.6 | 4 |