Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

About

Large language models (LLMs), optimized through human feedback, have rapidly emerged as a leading paradigm for developing intelligent conversational assistants. However, despite their strong performance across many benchmarks, LLM-based agents might still lack conversational skills such as disambiguation -- when they are faced with ambiguity, they often overhedge or implicitly guess users' true intents rather than asking clarification questions. Under task-specific settings, high-quality conversation samples are often limited, constituting a bottleneck for LLMs' ability to learn optimal dialogue action policies. We propose Action-Based Contrastive Self-Training (ACT), a quasi-online preference optimization algorithm based on Direct Preference Optimization (DPO), that enables data-efficient dialogue policy learning in multi-turn conversation modeling. We demonstrate ACT's efficacy under in data-efficient tuning scenarios, even when there is no action label available, using multiple real-world conversational tasks: tabular-grounded question-answering, machine reading comprehension, and AmbigSQL, a novel task for disambiguating information-seeking requests for complex SQL generation towards data analysis agents. Additionally, we propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation. ACT demonstrates substantial conversation modeling improvements over standard tuning approaches like supervised fine-tuning and DPO.

Maximillian Chen, Ruoxi Sun, Tomas Pfister, Sercan \"O. Ar{\i}k• 2024

Related benchmarks

TaskDatasetResultRank
Science Question AnsweringScienceQA (test)
Average Accuracy74.63
208
Conversational SQLCoSQL
Accuracy52.5
14
Scientific Question AnsweringSciQA
Accuracy81.86
13
Mathematical Dialogue EvaluationMathDial (test)
Accuracy20.67
7
Reasoning Question AnsweringARC
Accuracy64.01
7
Science Question AnsweringOpenBookQA
Accuracy61.06
7
Clarifying QuestionsOpenBookQA (test)
Accuracy20
6
Clarifying QuestionsSciQA (test)
Accuracy8
6
Semantic SimilarityAbg-CoQA
Similarity75.1
2
Showing 9 of 9 rows

Other info

Follow for update