ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation
About
Transparent object depth perception poses a challenge in everyday life and logistics, primarily due to the inability of standard 3D sensors to accurately capture depth on transparent or reflective surfaces. This limitation significantly affects depth map and point cloud-reliant applications, especially in robotic manipulation. We developed a vision transformer-based algorithm for stereo depth recovery of transparent objects. This approach is complemented by an innovative feature post-fusion module, which enhances the accuracy of depth recovery by structural features in images. To address the high costs associated with dataset collection for stereo camera-based perception of transparent objects, our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation, accelerated by AI algorithm. Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios, enabling precise depth mapping of transparent objects to assist in robotic manipulation. Project details are available at https://sites.google.com/view/cleardepth/ .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Stereo Matching | Middlebury 2014 | Bad Pixel Rate (Thresh 2.0, All)3.48 | 15 | |
| Stereo Depth Estimation | SynClearDepth transparent-object (val) | Average Error (AvgErr)3.1084 | 7 | |
| Robotic Grasping | Real-world cluttered L1 | Grasp Success Rate92 | 6 | |
| Robotic Grasping | Real-world single L1 | Grasp Success Rate98 | 3 | |
| Robotic Grasping | Real-world single L2 | Grasp Success Rate98 | 3 | |
| Robotic Grasping | Real-world cluttered L2 | Grasp Success Rate90 | 3 | |
| Robotic Grasping | Real-world single L1 (test) | Grasp Success Rate (SR)98 | 3 | |
| Robotic Grasping | Real-world single L2 (test) | Grasp SR98 | 3 | |
| Robotic Grasping | Real-world cluttered L2 (test) | Grasp Success Rate90 | 3 |