BOLT: Online Lightweight Adaptation for Preparation-Free Heterogeneous Cooperative Perception
About
Most existing heterogeneous cooperative perception methods depend on prior preparation like offline joint training or tailored collaborator-model adaptation. Such preprocessing is, however, generally impractical in real scenarios, as agents are usually independently trained by different developers and meet occasionally online. This work investigates \emph{preparation-free heterogeneous cooperative perception}, where agents use independently trained single-agent detectors without any pre-deployment coordination. We find direct cross-agent fusion under this setting greatly underperforms ego-only perception. We present BOLT, a lightweight plug-and-play module that adapts neighboring features online via ego-as-teacher distillation, requiring only ego predictions without ground-truth labels. BOLT leverages high-confidence ego perception features to guide cross-agent feature-domain alignment, while enabling neighbors to contribute features in the ego's low-confidence regions. With only 0.9M trainable parameters, BOLT improves AP@50 by up to 32.3 points over vanilla unadapted fusion in the preparation-free setting. It consistently outperforms ego-only results on DAIR-V2X and OPV2V, across different encoder pairs and fusion strategies. Code: https://github.com/sidiangongyuan/BOLT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cooperative Perception | DAIR-V2X | AP@3072.9 | 16 | |
| Cooperative Perception | OPV2V | AP@3086.8 | 16 | |
| 3D Object Detection | DAIR-V2X PP→SECOND | AP@3072.6 | 5 | |
| Cooperative Object Detection | OPV2V (test) | AP@3082.2 | 2 |