Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach

About

Human motion generation is a challenging task that aims to create realistic motion imitating natural human behaviour. We focus on the well-studied behaviour of priming an object/location for pick up or put down - that is, the spotting of an object/location from a distance, known as gaze priming, followed by the motion of approaching and reaching the target location. To that end, we curate, for the first time, 23.7K gaze-primed human motion sequences for reaching target object locations from five publicly available datasets, i.e., HD-EPIC, MoGaze, HOT3D, ADT, and GIMO. We pre-train a text-conditioned diffusion-based motion generation model, then fine-tune it conditioned on goal pose or location, on our curated sequences. Importantly, we evaluate the ability of the generated motion to imitate natural human movement through several metrics, including the 'Reach Success' and a newly introduced 'Prime Success' metric. Tested on 5 datasets, our model generates diverse full-body motion, exhibiting both priming and reaching behaviour, and outperforming baselines and recent methods.

Masashi Hatano, Saptarshi Sinha, Jacob Chalk, Wei-Hong Li, Hideo Saito, Dima Damen• 2025

Related benchmarks

Task	Dataset	Result
Motion Generation	HD-EPIC curated P&R sequences	Prime Success66.31	8
Motion Generation	MoGaze curated P&R sequences	Prime Success48.51	8
Motion Generation	HOT3D curated P&R sequences	Prime Success75.26	8
Motion Generation	ADT curated P&R sequences	Prime Success28.65	8
Motion Generation	GIMO curated P&R sequences	Prime Success47.61	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord