Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Understanding 3D Object Interaction from a Single Image

About

Humans can easily understand a single image as depicting multiple potential objects permitting interaction. We use this skill to plan our interactions with the world and accelerate understanding new objects without engaging in interaction. In this paper, we would like to endow machines with the similar ability, so that intelligent agents can better explore the 3D scene or manipulate objects. Our approach is a transformer-based model that predicts the 3D location, physical properties and affordance of objects. To power this model, we collect a dataset with Internet videos, egocentric videos and indoor images to train and validate our approach. Our model yields strong performance on our data, and generalizes well to robotics data. Project site: https://jasonqsy.github.io/3DOI/

Shengyi Qian, David F. Fouhey• 2023

Related benchmarks

TaskDatasetResultRank
Affordance predictionAGD20K unseen
KLD3.565
20
Articulated Object ManipulationReal-robot manipulation trials Textured Hinge
OSR60
9
Articulated Object Manipulation50 tasks in campus environments
Right Hinge Time (s)33.4
9
Articulated Object ManipulationReal-robot manipulation trials Right Hinge
OSR40
9
Articulated Object ManipulationReal-robot manipulation trials Mean across 50 tasks
Overall Success Rate (OSR)52
9
Articulated Object ManipulationReal-robot manipulation trials Prismatic Hinge
OSR70
9
Articulated Object ManipulationReal-robot manipulation trials Left Hinge
OSR40
9
Articulated Object ManipulationReal-robot manipulation trials Bottom Hinge
OSR50
8
Articulated Object Axis EstimationCampus-scale 50 tasks (test)
Right Hinge Axis EA-Score71.8
4
Articulated Object SegmentationCampus-scale 50 tasks (test)
Right Hinge Mask IoU72
3
Showing 10 of 10 rows

Other info

Follow for update