Beyond Goldfish Memory: Long-Term Open-Domain Conversation

About

Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions. We show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations, and we study long-context models that can perform much better. In particular, we find retrieval-augmented methods and methods with an ability to summarize and recall previous conversations outperform the standard encoder-decoder architectures currently considered state of the art.

Jing Xu, Arthur Szlam, Jason Weston• 2021

Related benchmarks

Task	Dataset	Result
Dialogue Response Generation	MSC	B-4 Score27.3	38
Dialogue Response Generation	KEEM (KMSC memories) 1.0 (test)	Perplexity9.51	14
Event Correlation Evaluation	Chronicle, MSC, and LoCoMo Average	CEA31.8	12
Dialogue Modeling	MSC (Multi-Session Chat) 1.0 (test)	Session 1 Perplexity8.2	10
Language Modeling	MSC Session 2 1.0 (val)	Perplexity9.08	10
Language Modeling	MSC Session 3 1.0 (val)	Perplexity8.96	10
Language Modeling	MSC Session 4 1.0 (val)	Perplexity9.07	10
Language Modeling	MSC Session 5 1.0 (val)	Perplexity8.99	10
Language Modeling	MSC Session Openings 1.0 (val)	Perplexity7.78	10
Language Modeling	MSC Session 1 1.0 (val)	Perplexity8.16	10

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord