DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

About

A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce DialoGLUE (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. We release several strong baseline models, demonstrating performance improvements over a vanilla BERT architecture and state-of-the-art results on 5 out of 7 tasks, by pre-training on a large open-domain dialogue corpus and task-adaptive self-supervised training. Through the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we hope to facilitate progress towards the goal of developing more general task-oriented dialogue models.

Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur• 2020

Related benchmarks

Task	Dataset	Result
Intent Classification	Banking77	Accuracy94.77	260
Dialogue State Tracking	MultiWOZ 2.1 (test)	Joint Goal Accuracy58.7	105
Intent Classification	HWU64	Accuracy94.33	17
Intent Classification	CLINC150	Accuracy97.8	17
Intent Detection	HWU 10-shot (test)	Accuracy86.28	16
Intent Detection	CLINC 10-shot (test)	Accuracy93.97	16
Intent Detection	BANKING 10-shot (test)	Accuracy85.95	16
Intent Detection	HWU 5-shot (test)	Accuracy0.8001	12
Intent Detection	CLINC 5-shot (test)	Accuracy90.49	12
Intent Detection	BANKING 5-shot (test)	Accuracy77.75	12

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord