Temporal Guidance for Large Language Models

About

Contrastive Decoding (CD) enhances the generation quality of large language models (LLMs) but incurs significant additional computational overhead due to the need for an auxiliary model. Existing internal self-contrastive decoding methods, such as Decoding by Contrasting Layers (DoLa), focus on discrepancies across different layers, which are notably unstable on small-scale models. In this work, based on the observation that LLMs exhibit local preferences, we propose a novel contrastive guidance strategy along the temporal dimension, namely Temporal Guidance (TeGu). Our method ingeniously leverages Multi-Token Prediction (MTP) to construct weaker amateur predictions for model self-contrast. To standardize the implementation of this mechanism, we further introduce a lightweight Conditional MTP Projector (cMTPP), which avoids maintaining multiple independent networks as required by other MTP modules. Across various model series and benchmarks, TeGu achieves significant performance improvements while maintaining low additional memory consumption and computational overhead.

Hong-Kai Zheng, Piji Li• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy91.05	983
Instruction Following	IFEval	Accuracy (0-100)34.2	292
Code Generation	HumanEval+	Pass@162.8	189
Mathematical Reasoning	MATH500	Accuracy (ACC)24.2	133
Mathematical Reasoning	GSM8K Platinum	Accuracy90.16	37
Code Generation	HEval	Pass@168.29	21

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord