Device-Conditioned Neural Architecture Search for Efficient Robotic Manipulation
About
The growing complexity of visuomotor policies poses significant challenges for deployment with heterogeneous robotic hardware constraints. However, most existing model-efficient approaches for robotic manipulation are device- and model-specific, lack generalizability, and require time-consuming per-device optimization during the adaptation process. In this work, we propose a unified framework named \textbf{D}evice-\textbf{C}onditioned \textbf{Q}uantization-\textbf{F}or-\textbf{A}ll (DC-QFA) which amortizes deployment effort with the device-conditioned quantization-aware training and hardware-constrained architecture search. Specifically, we introduce a single supernet that spans a rich design space over network architectures and mixed-precision bit-widths. It is optimized with latency- and memory-aware regularization, guided by per-device lookup tables. With this supernet, for each target platform, we can perform a once-for-all lightweight search to select an optimal subnet without any per-device re-optimization, which enables more generalizable deployment across heterogeneous hardware, and substantially reduces deployment time. To improve long-horizon stability under low precision, we further introduce multi-step on-policy distillation to mitigate error accumulation during closed-loop execution. Extensive experiments on three representative policy backbones, such as DiffusionPolicy-T, MDT-V, and OpenVLA-OFT, demonstrate that our DC-QFA achieves $2\text{-}3\times$ acceleration on edge devices, consumer-grade GPUs, and cloud platforms, with negligible performance drop in task success. Real-world evaluations on an Inovo robot equipped with a force/torque sensor further validates that our low-bit DC-QFA policies maintain stable, contact-rich manipulation even under severe quantization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | LIBERO | Spatial Success Rate97.2 | 314 | |
| Robotic Manipulation | Calvin ABCD→D | Avg Length3.64 | 89 | |
| Robotic Manipulation | CALVIN D->D | Average Length4.48 | 40 | |
| Robotic Manipulation | Push-T (multiple rollouts) | Success Rate77 | 13 | |
| Inserting a red pepper into a cup | Inovo robotic platform Real-world | Success Count11 | 8 | |
| Picking up a carrot and placing it into a bowl | Inovo robotic platform Real-world | Success Count13 | 8 | |
| Picking up eggs and placing them into a box | Inovo robotic platform Real-world | Success Count14 | 4 | |
| Sweeping coffee beans into a dustpan | Inovo robotic platform Real-world | Success Count18 | 4 |