Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
About
Conversational systems based on Large Language Models (LLMs), such as ChatGPT, show exceptional proficiency in context understanding and response generation. However, despite their impressive capabilities, they still possess limitations, such as providing randomly-guessed answers to ambiguous queries or failing to refuse users' requests, both of which are considered aspects of a conversational agent's proactivity. This raises the question of whether LLM-based conversational systems are equipped to handle proactive dialogue problems. In this work, we conduct a comprehensive analysis of LLM-based conversational systems, specifically focusing on three aspects of proactive dialogue systems: clarification, target-guided, and non-collaborative dialogues. To trigger the proactivity of LLMs, we propose the Proactive Chain-of-Thought prompting scheme, which augments LLMs with the goal planning capability over descriptive reasoning chains. Empirical findings are discussed to promote future studies on LLM-based proactive dialogue systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | ARC Challenge | -- | 906 | |
| Question Answering | ARC Easy | Accuracy79.6 | 597 | |
| Dialogue Response Generation | Chronicle | B-429.2 | 38 | |
| Dialogue Response Generation | MSC | B-4 Score32.5 | 38 | |
| Response Generation | Chronicle and MSC Average | CEA44 | 30 | |
| Charity Persuasion | P4G User Simulation | Success Rate (SR)68 | 16 | |
| Question Answering | HotpotQA | Mean Per-Step Regret0.191 | 15 | |
| Question Answering | SQuAD Abstract | Mean Per-Step Regret0.175 | 15 | |
| Multi-task Knowledge Understanding | MMLU | Mean Per-Step Regret0.142 | 15 | |
| Multiple-choice Question Answering | SciQ MC | Mean Per-Step Regret0.148 | 15 |