Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Temporal Guidance for Large Language Models

About

Contrastive Decoding (CD) enhances the generation quality of large language models (LLMs) but incurs significant additional computational overhead due to the need for an auxiliary model. Existing internal self-contrastive decoding methods, such as Decoding by Contrasting Layers (DoLa), focus on discrepancies across different layers, which are notably unstable on small-scale models. In this work, based on the observation that LLMs exhibit local preferences, we propose a novel contrastive guidance strategy along the temporal dimension, namely Temporal Guidance (TeGu). Our method ingeniously leverages Multi-Token Prediction (MTP) to construct weaker amateur predictions for model self-contrast. To standardize the implementation of this mechanism, we further introduce a lightweight Conditional MTP Projector (cMTPP), which avoids maintaining multiple independent networks as required by other MTP modules. Across various model series and benchmarks, TeGu achieves significant performance improvements while maintaining low additional memory consumption and computational overhead.

Hong-Kai Zheng, Piji Li• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy91.05
983
Instruction FollowingIFEval
Accuracy (0-100)34.2
292
Code GenerationHumanEval+
Pass@162.8
189
Mathematical ReasoningMATH500
Accuracy (ACC)24.2
133
Mathematical ReasoningGSM8K Platinum
Accuracy90.16
37
Code GenerationHEval
Pass@168.29
21
Showing 6 of 6 rows

Other info

Follow for update