Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding

About

Human motion understanding has advanced rapidly through vision-based progress in recognition, tracking, and captioning. However, most existing methods overlook physical cues such as joint actuation forces that are fundamental in biomechanics. This gap motivates our study: if and when do physically inferred forces enhance motion understanding? By incorporating forces into established motion understanding pipelines, we systematically evaluate their impact across baseline models on 3 major tasks: gait recognition, action recognition, and fine-grained video captioning. Across 8 benchmarks, incorporating forces yields consistent performance gains; for example, on CASIA-B, Rank-1 gait recognition accuracy improved from 89.52% to 90.39% (+0.87), with larger gain observed under challenging conditions: +2.7% when wearing a coat and +3.0% at the side view. On Gait3D, performance also increases from 46.0% to 47.3% (+1.3). In action recognition, CTR-GCN achieved +2.00% on Penn Action, while high-exertion classes like punching/slapping improved by +6.96%. Even in video captioning, Qwen2.5-VL's ROUGE-L score rose from 0.310 to 0.339 (+0.029), indicating that physics-inferred forces enhance temporal grounding and semantic richness. These results demonstrate that force cues can substantially complement visual and kinematic features under dynamic, occluded, or appearance-varying conditions.

Anh Dao, Manh Tran, Yufei Zhang, Xiaoming Liu, Zijun Cui• 2025

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D X-sub 120
Accuracy85.86
377
Action RecognitionNTU 120 (Cross-Setup)
Accuracy84.31
112
Action RecognitionNW-UCLA
Top-1 Acc93.97
67
Gait RecognitionGait3D
R-1 Acc47.3
49
Action RecognitionNTU-60 (xsub)
Accuracy89.96
40
Gait RecognitionCASIA-B--
18
Action RecognitionPenn-Action
Accuracy98
17
Action RecognitionNTU-60 (xview)
Accuracy94.9
12
Gait RecognitionCCGR mini
Rank-1 Accuracy20.6
2
Video CaptioningBoFiT
ROUGE-L0.339
2
Showing 10 of 10 rows

Other info

Follow for update