Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Data Selection for Multi-turn Dialogue Instruction Tuning

About

Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.

Bo Li, Shikun Zhang, Wei Ye• 2026

Related benchmarks

TaskDatasetResultRank
Multi-turn dialogueMT-Eval
LLM-EVAL Score8.16
20
Multi-turn dialogueConsistentChat
LLM-EVAL Score8.52
20
Multi-turn dialogueTopDial
LLM-EVAL7.32
20
Dialogue EvaluationBanking (test)
G-E6.72
10
Dialogue EvaluationConsistentChat (test)
G-E Score7.3
10
Showing 5 of 5 rows

Other info

Follow for update