MAIGO: Mitigating Lost-in-Conversation with History-Cleaned On-Policy Self-Distillation

About

Large language models often solve tasks from a fully specified prompt but degrade when the same requirements unfold over multiple turns, known as the lost-in-conversation (LiC) gap. We trace part of this degradation to self-contamination: intermediate assistant replies enter later context and carry early deviations forward. Motivated by this mechanism, we propose MAIGO, an on-policy self-distillation method that reduces this contamination using history-cleaned references from the model's own policy. For middle turns, MAIGO removes prior assistant replies while preserving the user-visible sharded prefix; for answer turns, it distills from paired full-view references conditioned on the completed user-side dialogue. A reliability weight downweights middle-turn samples that disagree with the clean reference. MAIGO requires no verifier rewards, state labels, or inference-time scaffolding. Under the LiC paired-view protocol with deterministic verifiers, MAIGO improves Qwen2.5-7B-Instruct SHARDED accuracy from 52.8 to 66.1 and the SHARDED/FULL ratio from 66.5% to 84.1%, while keeping FULL accuracy within 2.3 points. These results show that self-contamination is a trainable component of the LiC gap.

Haoyu Zheng, Yun Zhu, Shu Yuan, Shangming Chen, Qing Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang• 2026

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	Accuracy85.2	212
Function Calling / Tool Use	BFCL parallel parallel-multiple Actions	Accuracy82.1	20
Multi-task Evaluation	Aggregate (GSM8K, BFCL, Spider, HumanEval)	Average Accuracy78.6	20
Text-to-SQL	Spider no-easy	Accuracy59.6	20

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord