Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

About

Large language models (LLMs), optimized through human feedback, have rapidly emerged as a leading paradigm for developing intelligent conversational assistants. However, despite their strong performance across many benchmarks, LLM-based agents might still lack conversational skills such as disambiguation -- when they are faced with ambiguity, they often overhedge or implicitly guess users' true intents rather than asking clarification questions. Under task-specific settings, high-quality conversation samples are often limited, constituting a bottleneck for LLMs' ability to learn optimal dialogue action policies. We propose Action-Based Contrastive Self-Training (ACT), a quasi-online preference optimization algorithm based on Direct Preference Optimization (DPO), that enables data-efficient dialogue policy learning in multi-turn conversation modeling. We demonstrate ACT's efficacy under in data-efficient tuning scenarios, even when there is no action label available, using multiple real-world conversational tasks: tabular-grounded question-answering, machine reading comprehension, and AmbigSQL, a novel task for disambiguating information-seeking requests for complex SQL generation towards data analysis agents. Additionally, we propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation. ACT demonstrates substantial conversation modeling improvements over standard tuning approaches like supervised fine-tuning and DPO.

Maximillian Chen, Ruoxi Sun, Tomas Pfister, Sercan \"O. Ar{\i}k• 2024

Related benchmarks

Task	Dataset	Result
Science Question Answering	ScienceQA (test)	Average Accuracy74.63	273
Science Question Answering	OpenBookQA	Accuracy61.06	82
Scientific Question Answering	SciQA	Accuracy81.86	35
Reasoning Question Answering	ARC	Accuracy64.01	21
Conversational SQL	CoSQL	Accuracy52.5	14
Mathematical Dialogue Evaluation	MathDial (test)	Accuracy20.67	7
Clarifying Questions	OpenBookQA (test)	Accuracy20	6
Clarifying Questions	SciQA (test)	Accuracy8	6
Semantic Similarity	Abg-CoQA	Similarity75.1	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord