Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

About

A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce DialoGLUE (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. We release several strong baseline models, demonstrating performance improvements over a vanilla BERT architecture and state-of-the-art results on 5 out of 7 tasks, by pre-training on a large open-domain dialogue corpus and task-adaptive self-supervised training. Through the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we hope to facilitate progress towards the goal of developing more general task-oriented dialogue models.

Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur• 2020

Related benchmarks

TaskDatasetResultRank
Dialogue State TrackingMultiWOZ 2.1 (test)
Joint Goal Accuracy58.7
85
Intent ClassificationBanking77
Accuracy94.77
24
Intent DetectionHWU 10-shot (test)
Accuracy86.28
16
Intent DetectionCLINC 10-shot (test)
Accuracy93.97
16
Intent DetectionBANKING 10-shot (test)
Accuracy85.95
16
Intent DetectionHWU 5-shot (test)
Accuracy0.8001
12
Intent DetectionCLINC 5-shot (test)
Accuracy90.49
12
Intent DetectionBANKING 5-shot (test)
Accuracy77.75
12
Intent DetectionHWU Full (test)
Accuracy93.03
11
Intent DetectionCLINC Full (test)
Accuracy97.31
11
Showing 10 of 13 rows

Other info

Follow for update