Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

4D Visual Pre-training for Robot Learning

About

General visual representations learned from web-scale datasets for robotics have achieved great success in recent years, enabling data-efficient robot learning on manipulation tasks; yet these pre-trained representations are mostly on 2D images, neglecting the inherent 3D nature of the world. However, due to the scarcity of large-scale 3D data, it is still hard to extract a universal 3D representation from web datasets. Instead, we are seeking a general visual pre-training framework that could improve all 3D representations as an alternative. Our framework, called FVP, is a novel 4D Visual Pre-training framework for real-world robot learning. FVP frames the visual pre-training objective as a next-point-cloud-prediction problem, models the prediction model as a diffusion model, and pre-trains the model on the larger public datasets directly. Across twelve real-world manipulation tasks, FVP boosts the average success rate of 3D Diffusion Policy (DP3) for these tasks by 28%. The FVP pre-trained DP3 achieves state-of-the-art performance across imitation learning methods. Moreover, the efficacy of FVP adapts across various point cloud encoders and datasets. Finally, we apply FVP to the RDT-1B, a larger Vision-Language-Action robotic model, enhancing its performance on various robot tasks. Our project page is available at: https://4d-visual-pretraining.github.io/

Chengkai Hou, Yanjie Ze, Yankai Fu, Zeyu Gao, Songbo Hu, Yue Yu, Shanghang Zhang, Huazhe Xu• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationAdroit
Pen Task Score76
50
Robot ManipulationMetaWorld
Success Rate (Easy)80
10
Bell PressingFranka Real-World Manipulation (Evaluation)
Success Rate55
9
Cover BlockFranka Real-World Manipulation (Evaluation)
Success Rate50
9
Block-to-Block AlignmentFranka Manipulation Real-World (Evaluation)
Success Rate40
9
Fruit Pick-and-PlaceFranka Manipulation Real-World (Evaluation)
Success Rate40
9
Robot Manipulation AggregateFranka Manipulation Real-World (Evaluation)
Mean Success Rate46
9
Showing 7 of 7 rows

Other info

Follow for update