Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
About
Multi-person pose forecasting remains a challenging problem, especially in modeling fine-grained human body interaction in complex crowd scenarios. Existing methods typically represent the whole pose sequence as a temporal series, yet overlook interactive influences among people based on skeletal body parts. In this paper, we propose a novel Trajectory-Aware Body Interaction Transformer (TBIFormer) for multi-person pose forecasting via effectively modeling body part interactions. Specifically, we construct a Temporal Body Partition Module that transforms all the pose sequences into a Multi-Person Body-Part sequence to retain spatial and temporal information based on body semantics. Then, we devise a Social Body Interaction Self-Attention (SBI-MSA) module, utilizing the transformed sequence to learn body part dynamics for inter- and intra-individual interactions. Furthermore, different from prior Euclidean distance-based spatial encodings, we present a novel and efficient Trajectory-Aware Relative Position Encoding for SBI-MSA to offer discriminative spatial information and additional interactive clues. On both short- and long-term horizons, we empirically evaluate our framework on CMU-Mocap, MuPoTS-3D as well as synthesized datasets (6 ~ 10 persons), and demonstrate that our method greatly outperforms the state-of-the-art methods. Code will be made publicly available upon acceptance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-person motion prediction | ExPI (common action split) | A1 (A-frame) Error50 | 84 | |
| Multi-person motion prediction | ExPI unseen action | A8 Error56 | 21 | |
| Multi-person 3D motion prediction | CMU-Mocap 3 persons | MPJPE (1s Horizon)182 | 13 | |
| Multi-person motion prediction | CMU-Mocap UMPM 3 persons | JPE (0.2s)30 | 8 | |
| Multi-agent human pose forecasting | 3DPW (test) | JPE153.9 | 8 | |
| Multi-agent human pose forecasting | JRDB-GlobMultiPose Short-term (test) | JPE257.1 | 8 | |
| Multi-agent human pose forecasting | CMU-Mocap UMPM (test) | JPE170 | 8 | |
| Multi-agent human pose forecasting | JRDB-GlobMultiPose Long-term (test) | JPE443.2 | 8 | |
| Multi-person motion prediction | Mix2 10 persons | JPE (0.2s)34 | 7 | |
| Multi-person motion prediction | Mix1 6 persons | JPE (0.2s)34 | 7 |