History-Aware Visuomotor Policy Learning via Point Tracking

About

Many manipulation tasks require memory beyond the current observation, yet most visuomotor policies rely on the Markov assumption and thus struggle with repeated states or long-horizon dependencies. Existing methods attempt to extend observation horizons but remain insufficient for diverse memory requirements. To this end, we propose an object-centric history representation based on point tracking, which abstracts past observations into a compact and structured form that retains only essential task-relevant information. Tracked points are encoded and aggregated at the object level, yielding a compact history representation that can be seamlessly integrated into various visuomotor policies. Our design provides full history-awareness with high computational efficiency, leading to improved overall task performance and decision accuracy. Through extensive evaluations on diverse manipulation tasks, we show that our method addresses multiple facets of memory requirements - such as task stage identification, spatial memorization, and action counting, as well as longer-term demands like continuous and pre-loaded memory - and consistently outperforms both Markovian baselines and prior history-based approaches. Project website: http://tonyfang.net/history

Jingjing Chen, Hongjie Fang, Chenxi Wang, Shiquan Wang, Cewu Lu• 2025

Related benchmarks

Task	Dataset	Result
Add-Salt manipulation	Add-Salt	SR85	12
One-Move manipulation	One-Move	Success Rate95	12
Swap-Easy manipulation	Swap-Easy	SR90	12
Three-Scoop manipulation	Three-Scoop	Success Rate (SR)95	12
Swap-Hard manipulation	Swap-Hard	SR80	9
Guess task with pre-loaded memory	Guess Easy	Success Rate95	2
Guess task with pre-loaded memory	Guess Hard	Success Rate85	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord