Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

About

While spatial reasoning has made progress in object localization relationships, it often overlooks object orientation-a key factor in 6-DoF fine-grained manipulation. Traditional pose representations rely on pre-defined frames or templates, limiting generalization and semantic grounding. In this paper, we introduce the concept of semantic orientation, which defines object orientations using natural language in a reference-frame-free manner (e.g., the "plug-in" direction of a USB or the "handle" direction of a cup). To support this, we construct OrienText300K, a large-scale dataset of 3D objects annotated with semantic orientations, and develop PointSO, a general model for zero-shot semantic orientation prediction. By integrating semantic orientation into VLM agents, our SoFar framework enables 6-DoF spatial reasoning and generates robotic actions. Extensive experiments demonstrated the effectiveness and generalization of our SoFar, e.g., zero-shot 48.7% successful rate on Open6DOR and zero-shot 74.9% successful rate on SIMPLER-Env.

Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, Xinqiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, Jiawei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv WidowX Robot tasks
Average Success Rate5.83e+3
26
Put Carrot on PlateSimplerEnv WidowX
Success Rate0.667
18
Put Spoon on TowelSimplerEnv WidowX
Success Rate58.3
18
Stack Green on YellowSimplerEnv WidowX
Success Rate70.8
18
Put Eggplant in BasketSimplerEnv WidowX
Success Rate37.5
18
Simulation Object ManipulationSimplerEnv WidowX + Bridge setup
Placement Success Rate (Spoon on Towel)58.3
16
Simulation Robotic ManipulationSimplerEnv Google Robot setup
Horizontal Laying86.1
10
Visual Question Answering6-DoF SpatialBench 1.0 (test)
Rel. Position Error59.6
10
Spatial Visual Question AnsweringEmbSpatial-Bench (test)
Generation Score70.88
7
6-DoF Object RearrangementOpen6DOR Isaac Sim V1
Position Tracking Error (Level 0)86.3
6
Showing 10 of 11 rows

Other info

Follow for update