Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

About

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key component of spoken dialog systems. In this work, we propose a new approach to learn generic representations adapted to spoken dialog, which we evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE benchmark (\texttt{SILICONE}). \texttt{SILICONE} is model-agnostic and contains 10 different datasets of various sizes. We obtain our representations with a hierarchical encoder based on transformer architectures, for which we extend two well-known pre-training objectives. Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over $2.3$ billion of tokens. We demonstrate how hierarchical encoders achieve competitive results with consistently fewer parameters compared to state-of-the-art models and we show their importance for both pre-training and fine-tuning.

Emile Chapuis, Pierre Colombo, Matteo Manica, Matthieu Labeau, Chloe Clavel• 2020

Related benchmarks

Task	Dataset	Result
Emotion Recognition in Conversation	MELD (test)	Weighted F161.9	143
Emotion Recognition in Conversation	DailyDialog (test)	--	16
Spoken Language Understanding	SILICONE 1.0 (test)	Avg Score74.3	6
Dialogue Act Classification	MRDA (test)	F1 Score92.4	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord