Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation
About
Recent vision-language-action (VLA) models for multi-task robot manipulation often rely on fixed camera setups and shared visual encoders, which limit their performance under occlusions and during cross-task transfer. To address these challenges, we propose Task-aware Virtual View Exploration (TVVE), a framework that learns to select task-relevant virtual camera viewpoints and dynamically re-render observations from a reconstructed scene representation using the selected viewpoints. To enable efficient view selection, we train an exploration policy in a pseudo-environment. In addition, we introduce a Task-aware Mixture-of-Experts (TaskMoE) visual encoder that routes visual features to task-specialized experts, mitigating interference in multi-task learning. To evaluate robustness under distribution shifts, we construct RLBench-OG, an out-of-distribution benchmark with visual perturbations and camera pose variations. Experiments on RLBench and RLBench-OG demonstrate that TVVE achieves higher success rates than strong baselines, while real-robot experiments further confirm its robustness to visual disturbances and unseen instructions. Code and visualizations are available at: https://hcplab-sysu.github.io/TAVP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | RLBench multi-view (test) | Average Success Rate86.6 | 10 | |
| Robot Manipulation | RLBench single-view setup | Average Success Rate83.2 | 8 | |
| Robot Manipulation | RLBench-OG | Average Success Rate67 | 4 | |
| Robot Manipulation | Real-world Tasks Franka (test) | Average Success Rate (%)78 | 3 | |
| Robot Manipulation | Real-world Tasks Dobot (test) | Avg. Success Rate (%)88 | 2 | |
| Robot Manipulation | Franka Real-world Environments | Average Success Rate (SR)70 | 2 | |
| Robotic Manipulation | Dobot Nova 2 Pick Grape Average | Success Rate71.7 | 2 | |
| Robotic Manipulation | Dobot Nova 2 Pick Grape (Seen) | Success Rate100 | 2 | |
| Robotic Manipulation | Dobot Nova 2 Pick Grape (Unseen Instance) | Success Rate100 | 2 | |
| Robotic Manipulation | Dobot Nova 2 Pick Grape Unseen Background | Success Rate90 | 2 |