Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots

About

Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures. In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time.

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, Geehyuk Lee• 2018

Related benchmarks

TaskDatasetResultRank
3D co-speech gesture generationBEAT-ETrans (test)
FGD (h+t)40.95
14
3D co-speech gesture generationTED-ETrans (test)
FGD_h+t29.6
14
Co-speech gesture synthesisTED (test)
FGD18.154
9
Gesture GenerationBEAT official recomputed (test)
Hellinger Distance (Avg)0.146
7
Co-speech gesture generationTED Gesture
FGD18.154
7
Co-speech gesture generationTED Gesture & TED Expressive User Study (test)
Naturalness1.22
7
Co-speech gesture generationTED Expressive
FGD54.92
7
Gesture SynthesisTED Gesture (test)
MAJE45.62
7
Gesture SynthesisBEAT (Body-Expression-Audio-Text) 1.0 (test)
FGD261.3
7
Speech-driven gesture generationBEAT (test)
Global CCA42.9
7
Showing 10 of 14 rows

Other info

Follow for update