DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

About

Human motion is inherently continuous and dynamic, posing significant challenges for generative models. While discrete generation methods are widely used, they suffer from limited expressiveness and frame-wise noise artifacts. In contrast, continuous approaches produce smoother, more natural motion but often struggle to adhere to conditioning signals due to high-dimensional complexity and limited training data. To resolve this 'discord' between discrete and continuous representations we introduce DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a novel method that leverages rectified flow to decode discrete motion tokens in the continuous, raw motion space. Our core idea is to frame token decoding as a conditional generation task, ensuring that DisCoRD captures fine-grained dynamics and achieves smoother, more natural motions. Compatible with any discrete-based framework, our method enhances naturalness without compromising faithfulness to the conditioning signals on diverse settings. Extensive evaluations demonstrate that DisCoRD achieves state-of-the-art performance, with FID of 0.032 on HumanML3D and 0.169 on KIT-ML. These results establish DisCoRD as a robust solution for bridging the divide between discrete efficiency and continuous realism. Project website: https://whwjdqls.github.io/discord-motion/

Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu• 2024

Related benchmarks

Task	Dataset	Result
Text-to-motion generation	HumanML3D (test)	FID0.02	576
Text-to-motion generation	HumanML3D	FID0.032	96
Text-to-Motion Synthesis	KIT-ML	R Precision Top 377.5	61
Text-to-motion generation	HumanML3D MARDM-67 evaluator (test)	FID0.053	16

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord