Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Affordances from Human Videos as a Versatile Representation for Robotics

About

Building a robot that can understand and learn to interact by watching humans has inspired several vision problems. However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. Utilizing internet videos of human behavior, we train a visual affordance model that estimates where and how in the scene a human is likely to interact. The structure of these behavioral affordances directly enables the robot to perform many complex tasks. We show how to seamlessly integrate our affordance model with four robot learning paradigms including offline imitation learning, exploration, goal-conditioned learning, and action parameterization for reinforcement learning. We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild. Results, visualizations and videos at https://robo-affordances.github.io/

Shikhar Bahl, Russell Mendonca, Lili Chen, Unnat Jain, Deepak Pathak• 2023

Related benchmarks

TaskDatasetResultRank
GraspingEpic-Kitchens (Held-out Rare Objects)
Success Rate53
20
Imitation LearningCabinet (unseen)
Success Rate60
10
Imitation LearningKnife (unseen)
Success Rate0.3
10
Imitation LearningVeg (unseen)
Success Rate60
10
Imitation LearningShelf (unseen)
Success Rate0.8
10
Imitation LearningDoor (unseen)
Success Rate100
10
Imitation LearningLid (unseen)
Success Rate0.4
10
Imitation LearningDrawer (unseen)
Success Rate100
10
Imitation LearningPot (unseen)
Success Rate0.8
10
Robot ManipulationFrankaKitchen, PartManip, and ManiSkill simulation benchmarks (test)
T01 Success Rate100
6
Showing 10 of 15 rows

Other info

Follow for update