Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

About

Generating speech-consistent body and gesture movements is a long-standing problem in virtual avatar creation. Previous studies often synthesize pose movement in a holistic manner, where poses of all joints are generated simultaneously. Such a straightforward pipeline fails to generate fine-grained co-speech gestures. One observation is that the hierarchical semantics in speech and the hierarchical structures of human gestures can be naturally described into multiple granularities and associated together. To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. In HA2G, a Hierarchical Audio Learner extracts audio representations across semantic granularities. A Hierarchical Pose Inferer subsequently renders the entire human pose gradually in a hierarchical manner. To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations. Extensive experiments and human evaluation demonstrate that the proposed method renders realistic co-speech gestures and outperforms previous methods in a clear margin. Project page: https://alvinliu0.github.io/projects/HA2G

Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou• 2022

Related benchmarks

Task	Dataset	Result
Co-Speech Gesture Video Generation	PATS (test)	Diversity3.31	30
Co-speech 3D Gesture Synthesis	BEAT2 (test)	FGD12.32	27
Gesture Generation	BEAT-2 (test)	BC6.779	22
Gesture Generation	BEAT2	FGD12.32	17
Co-speech motion generation	BEATX (test)	FGD19.364	16
3D co-speech gesture generation	BEAT-ETrans (test)	FGD (h+t)7.28	14
3D co-speech gesture generation	TED-ETrans (test)	FGD_h+t16.72	14
Co-speech gesture generation	BEAT	FGD12.32	13
Holistic Motion Generation	BEAT2	FGD6.413	12
Gesture Generation	BEAT (test)	BC67.7	12

Showing 10 of 26 rows

Other info

Code

Follow for update

@wizwand_team Discord