Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets

About

Vision-based policies for robot manipulation have achieved significant recent success, but are still brittle to distribution shifts such as camera viewpoint variations. Robot demonstration data is scarce and often lacks appropriate variation in camera viewpoints. Simulation offers a way to collect robot demonstrations at scale with comprehensive coverage of different viewpoints, but presents a visual sim2real challenge. To bridge this gap, we propose MANGO -- an unpaired image translation method with a novel segmentation-conditioned InfoNCE loss, a highly-regularized discriminator design, and a modified PatchNCE loss. We find that these elements are crucial for maintaining viewpoint consistency during sim2real translation. When training MANGO, we only require a small amount of fixed-camera data from the real world, but show that our method can generate diverse unseen viewpoints by translating simulated observations. In this setting, MANGO outperforms all other image translation methods we tested. In certain real-world tabletop manipulation tasks, MANGO augmentation increases shifted-view success rates by over 40 percentage points compared to policies trained without augmentation.

Jeremiah Coholich, Justin Wit, Robert Azarcon, Zsolt Kira• 2026

Related benchmarks

TaskDatasetResultRank
Sim2real Unpaired Image Translationpick up coke Fixed View (test)
FID108
8
Sim2real Unpaired Image Translationpick up coke Randomized View (test)
FID160.9
8
Sim2real Unpaired Image Translationpick up coke Wrist View (test)
FID191.3
8
CoffeeRLBench Sim2sim Shared Object - Coffee
Success Rate64.67
6
HammerRLBench Sim2sim Unseen Object - Hammer
Success Rate86
6
Nut Asm.RLBench Sim2sim Cross-Embodiment - Nut Asm.
Success Rate45.33
6
StackRLBench Sim2sim Shared Object - Stack
Success Rate71.33
6
ThreadingRLBench Sim2sim Unseen Object - Threading
Success Rate30
6
PickPlaceRLBench Sim2sim Cross-Embodiment - PickPlace
Success Rate13.33
6
Showing 9 of 9 rows

Other info

Follow for update