Autoregressive Flow Matching for Motion Prediction

About

Motion prediction has been studied in different contexts with models trained on narrow distributions and applied to downstream tasks in human motion prediction and robotics. Simultaneously, recent efforts in scaling video prediction have demonstrated impressive visual realism, yet they struggle to accurately model complex motions despite massive scale. Inspired by the scaling of video generation, we develop autoregressive flow matching (ARFM), a new method for probabilistic modeling of sequential continuous data and train it on diverse video datasets to generate future point track locations over long horizons. To evaluate our model, we develop benchmarks for evaluating the ability of motion prediction models to predict human and robot motion. Our model is able to predict complex motions, and we demonstrate that conditioning robot action prediction and human motion prediction on predicted future tracks can significantly improve downstream task performance. Code and models publicly available at: https://github.com/Johnathan-Xie/arfm-motion-prediction.

Johnathan Xie, Stefan Stojanov, Cristobal Eyzaguirre, Daniel L. K. Yamins, Jiajun Wu• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	CALVIN 10% ABCD → D	Success Rate (L=1)84.1	11
Interaction Synthesis	FullBodyManipulation 1.0 (test)	Ts1.55	9
Track prediction	CALVIN ABC→D (test)	Success Rate (δ < 4)43.7	7
Track prediction	UCF-101 (test)	Success Rate @ δ425.2	7

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord