Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HSC-VLA: Hierarchical Scene-Clearing for Robust Bimanual Manipulation in Dense Clutter

About

Modern Vision--Language--Action models often suffer from critical instruction-following failures in high-density manipulation environments, where task-irrelevant visual clutter dilutes attention, corrupts grounding, and substantially degrades performance in complex long-horizon scenarios. To overcome the representation bottleneck of monolithic end-to-end architectures, we propose HSC-VLA, a hierarchical framework that decouples high-level visual-semantic reasoning from low-level, high-frequency sensorimotor execution through an explicit scene-clearing abstraction. HSC-VLA employs a high-level Brain to decompose long-horizon tasks and to generate task-specific scene masks that preserve task-relevant geometry while suppressing distractors. The filtered observations are then passed to a low-level Cerebellum, a diffusion-based policy that performs bimanual manipulation using only mask-filtered vision and proprioception. Extensive experiments in densely cluttered supermarket shelves demonstrate that HSC-VLA achieves 86.7\% aggregate success under high-density clutter, surpassing the best monolithic baseline ($\pi_0$-Full FT at 34.3\%) by 52.4\%. HSC-VLA also exhibits strong long-horizon performance, reaching 72\% on clutter sorting and 66\% on restocking, demonstrating strong robustness and effective failure recovery in complex cluttered manipulation.

Zhen Liu, Xinyu Ning, Zhe Hu, XinXin Xie, Yitong Liu, Zhongzhu Pu• 2026

Related benchmarks

TaskDatasetResultRank
Aggregate tasksLow Density Clutter
Aggr. Score90.7
7
Aggregate tasksHigh Density Clutter
Aggregate Score86.7
7
Bimanual ManipulationLow Density Clutter
Success Rate @ 100 Steps96
7
Bimanual ManipulationHigh Density Clutter
Success Rate @ 100 steps97
7
GraspHigh Density Clutter
Success Rate @30085
7
PlaceLow Density Clutter
SR@20084
7
PlaceHigh Density Clutter
SR@20078
7
GraspLow Density Clutter
Success Rate @ 30092
7
Clutter sortingLong-horizon manipulation Clutter sorting
Success Rate @ 50 steps72
2
RestockingLong-horizon manipulation Restocking
Success Rate @ 5066
2
Showing 10 of 11 rows

Other info

Follow for update