SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
About
While spatial reasoning has made progress in object localization relationships, it often overlooks object orientation-a key factor in 6-DoF fine-grained manipulation. Traditional pose representations rely on pre-defined frames or templates, limiting generalization and semantic grounding. In this paper, we introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner (e.g., the "plug-in" direction of a USB or the "handle" direction of a cup). To support this, we construct OrienText300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop PointSO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SoFar framework enables 6-DoF spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SoFar, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | SimplerEnv WidowX Robot tasks | Average Success Rate5.83e+3 | 26 | |
| Put Carrot on Plate | SimplerEnv WidowX | Success Rate0.667 | 18 | |
| Put Spoon on Towel | SimplerEnv WidowX | Success Rate58.3 | 18 | |
| Stack Green on Yellow | SimplerEnv WidowX | Success Rate70.8 | 18 | |
| Put Eggplant in Basket | SimplerEnv WidowX | Success Rate37.5 | 18 | |
| Simulation Object Manipulation | SimplerEnv WidowX + Bridge setup | Placement Success Rate (Spoon on Towel)58.3 | 16 | |
| Simulation Robotic Manipulation | SimplerEnv Google Robot setup | Horizontal Laying86.1 | 10 | |
| Visual Question Answering | 6-DoF SpatialBench 1.0 (test) | Rel. Position Error59.6 | 10 | |
| Spatial Visual Question Answering | EmbSpatial-Bench (test) | Generation Score70.88 | 7 | |
| 6-DoF Object Rearrangement | Open6DOR Isaac Sim V1 | Position Tracking Error (Level 0)86.3 | 6 |