Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory

About

Conventional visuomotor imitation learning usually predicts future robot actions directly in the time domain. Such formulations often have limited physical scene awareness and weak long-horizon memory. In contrast, world-model-based perception and memory-augmented policies can improve world awareness with substantial computation overhead. In this work, we propose Wavelet Policy, a lightweight imitation learning framework that combines World Prior Memory (WPM) with wavelet-based multi-scale action modeling. Our key idea is to encode persistent physical scene structure from static background images into compact memory tokens, which are fused into world-prior tokens and injected into the encoder during forward propagation. Based on this memory-conditioned representation, We further perform wavelet-domain decomposition over horizon-aligned latent action tokens and adopt a Single-Encoder Multiple-Decoder (SE2MD) architecture to model latent components at different temporal scales. The resulting latent subbands are reconstructed through inverse wavelet transform and finally projected into executable action chunks. To facilitate efficient world prior learning, we introduce a world-prior adaptation loss, encouraging the background encoder to retain persistent scene knowledge while remaining lightweight and stable. Extensive experiments on four simulated and six real-world robotic manipulation tasks show that Wavelet Policy consistently outperforms strong baselines. These results demonstrate that combining scale-domain action modeling with world-prior memory provides an effective and efficient solution for long-horizon embodied manipulation. We release the source code, data and model checkpoint of simulation task at https://github.com/lurenjia384/Wavelet_Policy.

Changchuan Yang, Yuhang Dong, Guanzhong Tian, Haizhou Ge, Hongrui Zhu• 2025

Related benchmarks

TaskDatasetResultRank
StackingReal-world--
9
Bimanual InsertionSimulation (test)
Grasp Success Rate89.5
6
Stack Two BlocksSimulation (test)
Stack Success Rate96.4
6
Transfer CubeSimulation (test)
Touch Rate99.5
6
Transfer PlusSimulation (test)
Lift91.8
6
stack blocksReal-world
Success Rate (Stack)80
2
Store ItemsReal-world
Success Rate (First)90
2
Store LemonReal-world
Success Rate (Grasp)90
2
Store StrawberryReal-world
Success Rate (Grasp)100
2
Assist SewingReal-world
Success Rate (Contact)80
2
Showing 10 of 10 rows

Other info

Follow for update