Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2

About

This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.

Yunyi Yang, Yunhao Li, Xiaojun Quan• 2020

Related benchmarks

TaskDatasetResultRank
Dialog State TrackingMultiWOZ 2.1 (test)
Joint Goal Accuracy56.2
88
End-to-end task-oriented dialogueMultiWOZ (test)
Task Success Rate79.5
68
End-to-end task-oriented dialogueMultiWOZ 2.1 (test)
BLEU Score16.7
49
Dialog State TrackingMultiWOZ 2.0 (test)
Joint Goal Accuracy52.59
47
Task-oriented DialogueMultiWOZ 2.0 (test)
Inform Rate95.4
37
Task-oriented DialogueMultiWOZ 2.2 (test)
Inform Rate83.4
23
End-to-end Dialogue ModellingMultiWOZ 2.0 (test)
Inform Rate95.4
22
End-to-end task-oriented dialogueMultiWOZ 2.0 (test)
Inform Accuracy94
22
Task-oriented Dialogue Response GenerationMulti-WOZ 2.1 (test)
BLEU16.5
22
Task-oriented DialogueMultiWOZ 2.1 (test)
Inform Rate95.7
11
Showing 10 of 21 rows

Other info

Follow for update