TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue
About
The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice. In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling. We propose a contrastive objective function to simulate the response selection task. Our pre-trained task-oriented dialogue BERT (TOD-BERT) outperforms strong baselines like BERT on four downstream task-oriented dialogue applications, including intention recognition, dialogue state tracking, dialogue act prediction, and response selection. We also show that TOD-BERT has a stronger few-shot ability that can mitigate the data scarcity problem for task-oriented dialogue.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dialog State Tracking | MultiWOZ 2.1 (test) | Joint Goal Accuracy48 | 88 | |
| Intent Classification | HINT3 10-shot | Accuracy66.42 | 23 | |
| Intent Classification | MCID 10-shot | Accuracy74.66 | 23 | |
| Intent Classification | HINT3 5-shot | Accuracy56.33 | 23 | |
| Intent Classification | BANKING77 5-shot (test) | Accuracy67.69 | 20 | |
| Intent Recognition | OOS (test) | Overall Accuracy86.6 | 19 | |
| Response Selection | MWOZ 2.1 | Accuracy (1/100)65.8 | 17 | |
| Intent Classification | BANKING77 10-shot (test) | Accuracy79.71 | 12 | |
| Intent Classification | HWU64 10-shot (test) | Accuracy82.15 | 12 | |
| Intent Classification | HWU64 5-shot (test) | Accuracy74.83 | 12 |