HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching

About

Coarse-to-fine autoregressive modeling has recently shown strong promise for visuomotor policy learning, combining the inference efficiency of autoregressive methods with the global trajectory coherence of diffusion-based policies. However, existing approaches rely on discrete action tokenizers that map continuous action sequences to codebook indices, a design inherited from image generation where learned compression is necessary for high-dimensional pixel data. We observe that robot actions are inherently low-dimensional continuous vectors, for which such tokenization introduces unnecessary quantization error and a multi-stage training pipeline. In this work, we propose Hierarchical Flow Policy (HiFlow), a tokenization-free coarse-to-fine autoregressive policy that operates directly on raw continuous actions. HiFlow constructs multi-scale continuous action targets from each action chunk via simple temporal pooling. Specifically, it averages contiguous action windows to produce coarse summaries that are refined at finer temporal resolutions. The entire model is trained end-to-end in a single stage, eliminating the need for a separate tokenizer. Experiments on MimicGen, RoboTwin 2.0, and real-world environments demonstrate that HiFlow consistently outperforms existing methods including diffusion-based and tokenization-based autoregressive policies.

Daichi Yashima, Koki Seno, Shuhei Kurita, Yusuke Oda, Komei Sugiura• 2026

Related benchmarks

Task	Dataset	Result
Grasping	Real-world Experiments	Apple Success Rate75	4
Relocation	Real-world Experiments	Ball to Dish Success Rate58.3	4
Robot Manipulation	MimicGen	Coffee Success Rate100	4
Target Placement	Real-world Experiments	Success Rate (Orange→Plate)41.7	4
Dual-arm Robot Manipulation	RoboTwin 2.0	Click Alarm Success69	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord