Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction

About

Text-based 3D human motion editing is a critical yet challenging task in computer vision and graphics. While training-free approaches have been explored, the recent release of the MotionFix dataset, which includes source-text-motion triplets, has opened new avenues for training, yielding promising results. However, existing methods struggle with precise control, often leading to misalignment between motion semantics and language instructions. In this paper, we introduce a related task, motion similarity prediction, and propose a multi-task training paradigm, where we train the model jointly on motion editing and motion similarity prediction to foster the learning of semantically meaningful representations. To complement this task, we design an advanced Diffusion-Transformer-based architecture that separately handles motion similarity prediction and motion editing. Extensive experiments demonstrate the state-of-the-art performance of our approach in both editing alignment and fidelity.

Zhengyuan Li, Kai Cheng, Anindita Ghosh, Uttaran Bhattacharya, Liangyan Gui, Aniket Bera• 2025

Related benchmarks

TaskDatasetResultRank
Instruction-Based Motion EditingMotionFix (Batch)
R@170.62
10
Instruction-Based Motion EditingMotionFix (Full)
R@125.49
9
Text-Based Motion EditingMotionFix Batch 4 (train/val)
R@170.62
7
Text-Based Motion EditingMotionFix 4 (test)
R@125.49
7
Edited-to-Target RetrievalMotionFix (test)
R@170.62
7
Edited-to-Source RetrievalMotionFix (test)
R@172.71
7
Motion EditingEgo-Exo4D Basketball and Soccer (test)
Mikan Pose Improvement (P)2.26
4
Human Motion EditingPerceptual Study
Alignment2.17
3
Text-Based Human Motion EditingMotionFix (test)
R@172.71
3
Motion EditingKyokushin Karate Dataset
Reverse Punch P2.2
3
Showing 10 of 10 rows

Other info

Code

Follow for update