Synesthesia of Vehicles: Tactile Data Synthesis from Visual Inputs
About
Autonomous vehicles (AVs) rely on multi-modal fusion for safety, but current visual and optical sensors fail to detect road-induced excitations which are critical for vehicles' dynamic control. Inspired by human synesthesia, we propose the Synesthesia of Vehicles (SoV), a novel framework to predict tactile excitations from visual inputs for autonomous vehicles. We develop a cross-modal spatiotemporal alignment method to address temporal and spatial disparities. Furthermore, a visual-tactile synesthetic (VTSyn) generative model using latent diffusion is proposed for unsupervised high-quality tactile data synthesis. A real-vehicle perception system collected a multi-modal dataset across diverse road and lighting conditions. Extensive experiments show that VTSyn outperforms existing models in temporal, frequency, and classification performance, enhancing AV safety through proactive tactile perception.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Road surface classification | Road surface classification dataset | Accuracy64.06 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset Asphalt | RMSE0.0388 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset Cement | RMSE0.0629 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset Muddy Road | RMSE0.1848 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset Dirt Road | RMSE0.1343 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset Gravel | RMSE0.1667 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset Brick Road | RMSE0.0817 | 4 | |
| Tactile Data Generation | Tactile Road Surface Dataset All roads | RMSE0.1115 | 4 |