Autoregressive Flow Matching for Motion Prediction
About
Motion prediction has been studied in different contexts with models trained on narrow distributions and applied to downstream tasks in human motion prediction and robotics. Simultaneously, recent efforts in scaling video prediction have demonstrated impressive visual realism, yet they struggle to accurately model complex motions despite massive scale. Inspired by the scaling of video generation, we develop autoregressive flow matching (ARFM), a new method for probabilistic modeling of sequential continuous data and train it on diverse video datasets to generate future point track locations over long horizons. To evaluate our model, we develop benchmarks for evaluating the ability of motion prediction models to predict human and robot motion. Our model is able to predict complex motions, and we demonstrate that conditioning robot action prediction and human motion prediction on predicted future tracks can significantly improve downstream task performance. Code and models publicly available at: https://github.com/Johnathan-Xie/arfm-motion-prediction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | CALVIN 10% ABCD → D | Success Rate (L=1)84.1 | 11 | |
| Interaction Synthesis | FullBodyManipulation 1.0 (test) | Ts1.55 | 9 | |
| Track prediction | CALVIN ABC→D (test) | Success Rate (δ < 4)43.7 | 7 | |
| Track prediction | UCF-101 (test) | Success Rate @ δ425.2 | 7 |