Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data

About

While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new data is expensive to collect. Moreover, true intelligence goes far beyond verifiable tasks. Therefore, we need self-improvement frameworks that are less dependent on external signals and more broadly applicable to both verifiable and non-verifiable domains. We propose **Mutual Information Preference Optimization (MIPO)**, a contrastive data augmentation method that constructs preference pairs by generating a positive response conditioning on the correct prompt, and a negative response by conditioning on a random, unrelated prompt. We show that using Direct Preference Optimization to learn from this paired data maximizes pointwise mutual information *under the base LLM* between prompts and model responses. Experiments with with 1-7B parameter Llama and Qwen instruct models show that MIPO achieves 3-16% gains (and 51% increase for Qwen2.5-1B-Instruct) on personalization compared to prompting baselines. Surprisingly, MIPO can also be useful in verifiable domains, such as math and multiple-choice question answering, yielding 1-20% gains *without any additional data or external supervision*. These results suggest a promising direction for self-improvement using intrinsic signals derived from contrastive data pairs.

Hyunji Nam, Haoran Li, Natasha Jaques• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringARC-Challenge 0-shot (test)
Accuracy90.4
48
PersonalizationCommunity Alignment (CA)
Personalization Win-Rate93.67
45
PersonalizationPRISM
Personalization Win Rate81.62
45
PersonalizationMulti-Bench (MB)
Win Rate94.84
45
Multiple-choice Question AnsweringMMLU zero-shot (test)
Accuracy (MMLU zero-shot)75
27
Mathematical ReasoningGSM8K 8-shot (test)
Accuracy93
25
Mathematical ReasoningSVAMP 8-shot (test)
Accuracy91.67
25
Multiple-choice Question AnsweringARC-Easy zero-shot (test)
Accuracy93.8
25
Showing 8 of 8 rows

Other info

Follow for update