Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding
About
Human motion understanding has advanced rapidly through vision-based progress in recognition, tracking, and captioning. However, most existing methods overlook physical cues such as joint actuation forces that are fundamental in biomechanics. This gap motivates our study: if and when do physically inferred forces enhance motion understanding? By incorporating forces into established motion understanding pipelines, we systematically evaluate their impact across baseline models on 3 major tasks: gait recognition, action recognition, and fine-grained video captioning. Across 8 benchmarks, incorporating forces yields consistent performance gains; for example, on CASIA-B, Rank-1 gait recognition accuracy improved from 89.52% to 90.39% (+0.87), with larger gain observed under challenging conditions: +2.7% when wearing a coat and +3.0% at the side view. On Gait3D, performance also increases from 46.0% to 47.3% (+1.3). In action recognition, CTR-GCN achieved +2.00% on Penn Action, while high-exertion classes like punching/slapping improved by +6.96%. Even in video captioning, Qwen2.5-VL's ROUGE-L score rose from 0.310 to 0.339 (+0.029), indicating that physics-inferred forces enhance temporal grounding and semantic richness. These results demonstrate that force cues can substantially complement visual and kinematic features under dynamic, occluded, or appearance-varying conditions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | NTU RGB+D X-sub 120 | Accuracy85.86 | 377 | |
| Action Recognition | NTU 120 (Cross-Setup) | Accuracy84.31 | 112 | |
| Action Recognition | NW-UCLA | Top-1 Acc93.97 | 67 | |
| Gait Recognition | Gait3D | R-1 Acc47.3 | 49 | |
| Action Recognition | NTU-60 (xsub) | Accuracy89.96 | 40 | |
| Gait Recognition | CASIA-B | -- | 18 | |
| Action Recognition | Penn-Action | Accuracy98 | 17 | |
| Action Recognition | NTU-60 (xview) | Accuracy94.9 | 12 | |
| Gait Recognition | CCGR mini | Rank-1 Accuracy20.6 | 2 | |
| Video Captioning | BoFiT | ROUGE-L0.339 | 2 |