Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards a Human-like Open-Domain Chatbot

About

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.

Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le• 2020

Related benchmarks

TaskDatasetResultRank
Text-based Visual Question AnsweringTextVQA--
496
Visual Question AnsweringGQA
Accuracy0.00e+0
374
Chart Question AnsweringChartQA--
229
Diagram Question AnsweringAI2D
AI2D Accuracy69.1
196
Optical Character Recognition BenchmarkingOCRBench--
109
Visual Question AnsweringCOCO
Score6.2
21
OCR-based Visual Question AnsweringOCRVQA
Mean Accuracy0.00e+0
13
Multi-modal EvaluationMME-RW
Mean Accuracy27.6
10
Multi-modal Hallucination EvaluationAMBER
Mean Accuracy53.5
10
Dialogue EvaluationHuman/Model Chats (test)
Engagement Score37
6
Showing 10 of 11 rows

Other info

Follow for update