UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2

About

This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.

Yunyi Yang, Yunhao Li, Xiaojun Quan• 2020

Related benchmarks

Task	Dataset	Result
Dialogue State Tracking	MultiWOZ 2.1 (test)	Joint Goal Accuracy56.2	105
Dialog State Tracking	MultiWOZ 2.1 (test)	Joint Goal Accuracy56.2	88
End-to-end task-oriented dialogue	MultiWOZ (test)	Task Success Rate79.5	78
End-to-end task-oriented dialogue	MultiWOZ 2.1 (test)	BLEU Score16.7	57
Dialog State Tracking	MultiWOZ 2.0 (test)	Joint Goal Accuracy52.59	47
Task-oriented Dialogue	MultiWOZ 2.0 (test)	Inform Rate95.4	37
Dialogue State Tracking	MultiWOZ 2.0 (test)	Joint Goal Accuracy52.59	29
Task-oriented Dialogue	MultiWOZ 2.2 (test)	Inform Rate83.4	23
End-to-end Dialogue Modelling	MultiWOZ 2.0 (test)	Inform Rate95.4	22
End-to-end task-oriented dialogue	MultiWOZ 2.0 (test)	Inform Accuracy94	22

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord