Test-time Sparsity for Extreme Fast Action Diffusion

About

Action diffusion excels at high-fidelity action generation but incurs heavy computational costs owing to its iterative denoising nature. Despite current technologies showing promise in accelerating diffusion transformers by reusing the cached features, they struggle to adapt to policy dynamics arising from diverse perceptions and multi-round rollout iterations in open environments. We propose test-time sparsity to tackle this challenge, which aims to accelerate action diffusion by dynamically predicting prunable residual computations for each model forward at test time. However, two bottlenecks remain in this paradigm: 1) repetitive conditional encoding and pruning offset most potential speed gains, and 2) the features cached from previous denoising timesteps cannot constrain large pruning errors under aggressive sparsity. To address the first bottleneck, we design a highly parallelized inference pipeline that minimizes the non-decoder delay to milliseconds. Specifically, we first design a lightweight pruner that shares the encoder with the diffusion transformer. Then, we decouple the encoding and pruning from the autoregressive denoising loop by processing all denoising timesteps in parallel, and overlap the pruner with the decoder forward inference through asynchronism. To overcome the second bottleneck, we introduce an omnidirectional reusing strategy, which achieves 95% sparsity by selectively reusing features cached from the current forward, previous denoising timesteps, and earlier rollout iterations. To learn the rollout-level reusing strategies, we sample a few action trajectories to supervise the sparsified diffusion step by step. Extensive experiments demonstrate that our method reduces FLOPs by 92% and accelerates action generation by 5x, achieving lossless performance with an inference frequency of 47.5 Hz. Our code is available at https://github.com/ky-ji/Test-time-Sparsity.

Kangye Ji, Yuan Meng, Jianbo Zhou, Ye Li, Chen Tang, Zhi Wang• 2026

Related benchmarks

Task	Dataset	Result
Class-to-image generation	ImageNet	Speedup2.63	54
Multi-stage Robotic Manipulation	Kitchen (test)	Success Rate (Kit_p1)100	15
Square	Mixed Human (MH)	Success Rate82	6
Can	Mixed Human (MH)	Success Rate94	6
Lift	Mixed Human (MH)	Success Rate100	6
Insert	ManiSkill Insert	Success Rate16	2
Pick	ManiSkill Pick	Success Rate80	2
Push	ManiSkill Push	Success Rate100	2
Stack	ManiSkill Stack	Success Rate80	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord