Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Second Conversational Intelligence Challenge (ConvAI2)

About

We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) -- in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).

Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston• 2019

Related benchmarks

TaskDatasetResultRank
Dialogue GenerationPERSONA-CHAT original (dev)
Hits@117.3
13
Dialogue GenerationPERSONA-CHAT Revised (dev)
Hits@116.3
11
Dialogue Policy EvaluationPersonaChat (test)
USR RET87.9
10
Persona-based DialogueConvAI2 (test)
Hits@155.1
10
Dialogue Policy EvaluationEmpathetic Dialogues (test)
USR MLM0.644
8
Dialogue Policy EvaluationDailyDialog (test)
USR MLM8
8
Dialogue GenerationPERSONA-CHAT original (dev)
Category 1 Score26.3
3
Dialogue Response GenerationConvAI2 (val)
F1 Score19.09
3
Showing 8 of 8 rows

Other info

Follow for update