Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer
About
The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging. Achieving high-quality and controllable LiDAR data generation is urgently needed, controlling with text is a common practice, but there is little research in this field. To this end, we propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generation model. Specifically, we design an equirectangular transformer architecture, utilizing the designed equirectangular attention to capture LiDAR features in a manner with data characteristics. Then, we design a control-signal embedding injector to efficiently integrate control signals through the global-to-focused attention mechanism. Additionally, we devise a frequency modulator to assist the model in recovering high-frequency details, ensuring the clarity of the generated point cloud. To foster development in the field and optimize text-controlled generation performance, we construct nuLiDARtext which offers diverse text descriptors for 34,149 LiDAR point clouds from 850 scenes. Experiments on uncontrolled and text-controlled generation in various forms on KITTI-360 and nuScenes datasets demonstrate the superiority of our approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| LiDAR Semantic Segmentation | SemanticKITTI | mIoU58.35 | 36 | |
| LiDAR Scene Generation | KITTI-360 (val) | FRD425.9 | 9 | |
| Unconditional LiDAR Generation | KITTI-360 19 | FRD164.2 | 8 | |
| LiDAR Scene Generation | nuScenes 2 | FPD147.5 | 7 | |
| LiDAR point cloud generation | KITTI-360 Text conditioned | FRD170.1 | 6 | |
| LiDAR Generation | Constructed multi-domain dataset Vehicle | FRD456.4 | 4 | |
| LiDAR Generation | Constructed multi-domain dataset | FRD617.5 | 4 | |
| LiDAR Generation | Constructed multi-domain dataset Fog | FRD1.20e+3 | 4 | |
| LiDAR Generation | Constructed multi-domain dataset Rain | FRD1.03e+3 | 4 | |
| LiDAR Generation | Constructed multi-domain dataset Wet Ground | FRD850.3 | 4 |