Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data

About

While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new data is expensive to collect. Moreover, true intelligence goes far beyond verifiable tasks. Therefore, we need self-improvement frameworks that are less dependent on external signals and more broadly applicable to both verifiable and non-verifiable domains. We propose **Mutual Information Preference Optimization (MIPO)**, a contrastive data augmentation method that constructs preference pairs by generating a positive response conditioning on the correct prompt, and a negative response by conditioning on a random, unrelated prompt. We show that using Direct Preference Optimization to learn from this paired data maximizes pointwise mutual information *under the base LLM* between prompts and model responses. Experiments with with 1-7B parameter Llama and Qwen instruct models show that MIPO achieves 3-16% gains (and 51% increase for Qwen2.5-1.5B-Instruct) on personalization compared to prompting baselines. Surprisingly, MIPO can also be useful in verifiable domains, such as math and multiple-choice question answering, yielding 1-20% gains *without any additional data or external supervision*. These results suggest a promising direction for self-improvement using intrinsic signals derived from contrastive data pairs.

Hyunji Nam, Haoran Li, Natasha Jaques• 2026

Related benchmarks

Task	Dataset	Result
Question Answering	ARC-Challenge 0-shot (test)	Accuracy90.4	48
Personalization	Community Alignment (CA)	Personalization Win-Rate93.67	45
Personalization	PRISM	Personalization Win Rate81.62	45
Personalization	Multi-Bench (MB)	Win Rate94.84	45
Multiple-choice Question Answering	MMLU zero-shot (test)	Accuracy (MMLU zero-shot)75	27
Mathematical Reasoning	GSM8K 8-shot (test)	Accuracy93	25
Mathematical Reasoning	SVAMP 8-shot (test)	Accuracy91.67	25
Multiple-choice Question Answering	ARC-Easy zero-shot (test)	Accuracy93.8	25

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord