Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
About
In dynamic environments, robots often encounter constrained movement trajectories when manipulating objects with specific properties, such as doors. Therefore, applying the appropriate force is crucial to prevent damage to both the robots and the objects. However, current vision-guided robot state generation methods often falter in this regard, as they lack the integration of tactile perception. To tackle this issue, this paper introduces a novel state diffusion framework termed SafeDiff. It generates a prospective state sequence from the current robot state and visual context observation while incorporating real-time tactile feedback to refine the sequence. As far as we know, this is the first study specifically focused on ensuring force safety in robotic manipulation. It significantly enhances the rationality of state planning, and the safe action trajectory is derived from inverse dynamics based on this refined planning. In practice, unlike previous approaches that concatenate visual and tactile data to generate future robot state sequences, our method employs tactile data as a calibration signal to adjust the robot's state within the state space implicitly. Additionally, we've developed a large-scale simulation dataset called SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening, across both simulated and real-world settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Shoe Lacing | Shoe Lacing 100 Demos, Bi-Arx5 Dual-arm 1.0 (test) | Success Rate0.00e+0 | 13 | |
| Cucumber Peeling | Cucumber Peeling 50 Demos, Bi-Arx5 Dual-arm 1.0 (test) | Task Score63.5 | 13 | |
| Lock Opening | Lock Opening 20 Demos Flexiv Rizon4 Single-arm 1.0 (test) | Success Rate5 | 13 | |
| Vase Wiping | Vase Wiping 30 Demos Flexiv Rizon4 Single-arm 1.0 (test) | Task Score32.5 | 13 | |
| Multi-task Performance Aggregation | Combined Five Tasks (Shoe Lacing, Chip Handover, Cucum. Peeling, Vase Wiping, Lock Opening) 1.0 (average) | Average Performance22.2 | 13 | |
| Chip Handover | Chip Handover 50 Demos Bi-Arx5 Dual-arm 1.0 (test) | Success Rate10 | 13 |