Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

About

Existing embodied control research demonstrates remarkable performance improvements by scaling training data and model size. We instead explore inference-time strategy as an alternative axis. Non-deterministic generative models, such as diffusion and autoregressive models, have been widely adopted in the field of embodied control. However, the single-shot inference paradigm limits their performance. In this paper, we propose \textbf{TapSampling}, a plug-and-play framework for inference-time sampling. First, we introduce an Action-VAE that represents actions in a low-dimensional latent space by mapping policy-generated initial actions into a compressed posterior distribution, from which any number of latent samples can be drawn and decoded into candidate actions that approximate the true action distribution. Second, we formulate action verification as task-progress outcome prediction, using the intrinsic sequential structure of robotic datasets to train a semantically grounded verifier for interpretable action selection. Furthermore, TapSampling is a policy-agnostic framework. Extensive experiments in both simulated and real-world environments demonstrate that our method substantially improves multiple generalist policies without further policy finetuning. Code and models are available at the project page.

Sizhe Zhao, Shengping Zhang, Shuo Yang, Weiyu Zhao, Shuigen Wang, Xiangyang Ji• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationCalvin ABCD→D
Avg Length2.41
130
Pick-&-PlaceReal-world robot manipulation unseen tasks
Average Success Rate66.7
7
Robotic ManipulationLIBERO LONG (test)
Task 0 Success Rate98
4
Robotic ManipulationReal-world Knock Down (Seen)
Success Rate93.3
2
Robotic ManipulationReal-world Knock Down (Unseen)
Success Rate93.3
2
Robotic ManipulationReal-world Pick and Place (Seen)
Success Rate76.7
2
Robotic ManipulationReal-world Stack (Seen)
Success Rate85
2
Robotic ManipulationReal-world Stack (Unseen)
Success Rate85
2
Showing 8 of 8 rows

Other info

Follow for update