Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAIGO: Mitigating Lost-in-Conversation with History-Cleaned On-Policy Self-Distillation

About

Large language models often solve tasks from a fully specified prompt but degrade when the same requirements unfold over multiple turns, known as the lost-in-conversation (LiC) gap. We trace part of this degradation to self-contamination: intermediate assistant replies enter later context and carry early deviations forward. Motivated by this mechanism, we propose MAIGO, an on-policy self-distillation method that reduces this contamination using history-cleaned references from the model's own policy. For middle turns, MAIGO removes prior assistant replies while preserving the user-visible sharded prefix; for answer turns, it distills from paired full-view references conditioned on the completed user-side dialogue. A reliability weight downweights middle-turn samples that disagree with the clean reference. MAIGO requires no verifier rewards, state labels, or inference-time scaffolding. Under the LiC paired-view protocol with deterministic verifiers, MAIGO improves Qwen2.5-7B-Instruct SHARDED accuracy from 52.8 to 66.1 and the SHARDED/FULL ratio from 66.5% to 84.1%, while keeping FULL accuracy within 2.3 points. These results show that self-contamination is a trainable component of the LiC gap.

Haoyu Zheng, Yun Zhu, Shu Yuan, Shangming Chen, Qing Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang• 2026

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Accuracy85.2
115
Function Calling / Tool UseBFCL parallel parallel-multiple Actions
Accuracy82.1
20
Multi-task EvaluationAggregate (GSM8K, BFCL, Spider, HumanEval)
Average Accuracy78.6
20
Text-to-SQLSpider no-easy
Accuracy59.6
20
Showing 4 of 4 rows

Other info

Follow for update