Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

About

Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.

Bing-Cheng Chuang, I-Hsuan Chu, Bor-Jiun Lin, YuanFu Yang, Min Sun, Chun-Yi Lee• 2026

Related benchmarks

TaskDatasetResultRank
Long-horizon language-conditioned manipulationCalvin ABC->D
Success Rate (Seq 1)93.7
12
Move Doll PlatformReal Robot Move Doll Platform
Success Rate100
2
Sort BlocksReal Robot Sort Blocks
Success Rate75
2
stack cupsReal Robot Stack Cups
Success Rate60
2
Put Block in BoxReal Robot Put Block in Box
Success Rate0.75
2
Showing 5 of 5 rows

Other info

Follow for update