Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation

About

Recent vision-language-action (VLA) models for multi-task robot manipulation often rely on fixed camera setups and shared visual encoders, which limit their performance under occlusions and during cross-task transfer. To address these challenges, we propose Task-aware Virtual View Exploration (TVVE), a framework that learns to select task-relevant virtual camera viewpoints and dynamically re-render observations from a reconstructed scene representation using the selected viewpoints. To enable efficient view selection, we train an exploration policy in a pseudo-environment. In addition, we introduce a Task-aware Mixture-of-Experts (TaskMoE) visual encoder that routes visual features to task-specialized experts, mitigating interference in multi-task learning. To evaluate robustness under distribution shifts, we construct RLBench-OG, an out-of-distribution benchmark with visual perturbations and camera pose variations. Experiments on RLBench and RLBench-OG demonstrate that TVVE achieves higher success rates than strong baselines, while real-robot experiments further confirm its robustness to visual disturbances and unseen instructions. Code and visualizations are available at: https://hcplab-sysu.github.io/TAVP.

Yongjie Bai, Zhouxia Wang, Yang Liu, Kaijun Luo, Yifan Wen, Mingtong Dai, Weixing Chen, Ziliang Chen, Lingbo Liu, Guanbin Li, Liang Lin• 2025

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationRLBench multi-view (test)
Average Success Rate86.6
10
Robot ManipulationRLBench single-view setup
Average Success Rate83.2
8
Robot ManipulationRLBench-OG
Average Success Rate67
4
Robot ManipulationReal-world Tasks Franka (test)
Average Success Rate (%)78
3
Robot ManipulationReal-world Tasks Dobot (test)
Avg. Success Rate (%)88
2
Robot ManipulationFranka Real-world Environments
Average Success Rate (SR)70
2
Robotic ManipulationDobot Nova 2 Pick Grape Average
Success Rate71.7
2
Robotic ManipulationDobot Nova 2 Pick Grape (Seen)
Success Rate100
2
Robotic ManipulationDobot Nova 2 Pick Grape (Unseen Instance)
Success Rate100
2
Robotic ManipulationDobot Nova 2 Pick Grape Unseen Background
Success Rate90
2
Showing 10 of 13 rows

Other info

Follow for update