ManiVID-3D: Generalizable View-Invariant Reinforcement Learning for Robotic Manipulation via Disentangled 3D Representations

About

Deploying visual reinforcement learning (RL) policies in real-world manipulation is often hindered by camera viewpoint changes. A policy trained from a fixed front-facing camera may fail when the camera is shifted -- an unavoidable situation in real-world settings where sensor placement is hard to manage appropriately. Existing methods often rely on precise camera calibration or struggle with large perspective changes. To address these limitations, we propose ManiVID-3D, a novel 3D RL architecture designed for robotic manipulation, which learns view-invariant representations through self-supervised disentangled feature learning. The framework incorporates ViewNet, a lightweight yet effective module that automatically aligns point cloud observations from arbitrary viewpoints into a unified spatial coordinate system without the need for extrinsic calibration. Additionally, we develop an efficient GPU-accelerated batch rendering module capable of processing over 5000 frames per second, enabling large-scale training for 3D visual RL at unprecedented speeds. Extensive evaluation across 10 simulated and 5 real-world tasks demonstrates that our approach achieves a 40.6% higher success rate than state-of-the-art methods under viewpoint variations while using 80% fewer parameters. The system's robustness to severe perspective changes and strong sim-to-real performance highlight the effectiveness of learning geometrically consistent representations for scalable robotic manipulation in unstructured environments.

Zheng Li, Pei Qu, Yufei Jia, Shihui Zhou, Haizhou Ge, Jiahang Cao, Jinni Zhou, Guyue Zhou, Jun Ma• 2025

Related benchmarks

Task	Dataset	Result
Cube Lift	AIRBOT Play CubeLift	Success Rate92	11
Button press	Maniwhere-inspired Benchmark AIRBOT Play	Success Rate98.8	8
Button Press Dex	Maniwhere-inspired Benchmark UR5	Success Rate98.4	8
Drawer-Open	Maniwhere-inspired Benchmark UR5	Success Rate85.6	8
Hand Over Dual	Maniwhere-inspired Benchmark Franka	Success Rate95.6	8
Laptop Close	Maniwhere-inspired Benchmark AIRBOT Play	Success Rate95.6	8
Pick & Place Dex	Maniwhere-inspired Benchmark Franka	Success Rate88.8	8
Pick-&-Place	Maniwhere-inspired Benchmark AIRBOT Play	Success Rate91.2	8
Reach	Maniwhere-inspired Benchmark AIRBOT Play	Success Rate99.2	8
Reach Dex	Maniwhere-inspired Benchmark UR5	Success Rate99.6	8

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord