Data Selection for Multi-turn Dialogue Instruction Tuning

About

Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.

Bo Li, Shikun Zhang, Wei Ye• 2026

Related benchmarks

Task	Dataset	Result
Multi-turn dialogue	MT-Eval	LLM-EVAL Score8.16	20
Multi-turn dialogue	ConsistentChat	LLM-EVAL Score8.52	20
Multi-turn dialogue	TopDial	LLM-EVAL7.32	20
Dialogue Evaluation	Banking (test)	G-E6.72	10
Dialogue Evaluation	ConsistentChat (test)	G-E Score7.3	10

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord