Maximizing Mutual Information Between Prompt and Response Improves LLM Performance With No Additional Data
About
While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new data is expensive to collect. Moreover, true intelligence goes far beyond verifiable tasks. Therefore, we need self-improvement frameworks that are less dependent on external signals and more broadly applicable to both verifiable and non-verifiable domains. We propose **Mutual Information Preference Optimization (MIPO)**, a contrastive data augmentation method that constructs preference pairs by generating a positive response conditioning on the correct prompt, and a negative response by conditioning on a random, unrelated prompt. We show that using Direct Preference Optimization to learn from this paired data maximizes pointwise mutual information *under the base LLM* between prompts and model responses. Experiments with with 1-7B parameter Llama and Qwen instruct models show that MIPO achieves 3-16% gains (and 51% increase for Qwen2.5-1B-Instruct) on personalization compared to prompting baselines. Surprisingly, MIPO can also be useful in verifiable domains, such as math and multiple-choice question answering, yielding 1-20% gains *without any additional data or external supervision*. These results suggest a promising direction for self-improvement using intrinsic signals derived from contrastive data pairs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Question Answering | ARC-Challenge 0-shot (test) | Accuracy90.4 | 48 | |
| Personalization | Community Alignment (CA) | Personalization Win-Rate93.67 | 45 | |
| Personalization | PRISM | Personalization Win Rate81.62 | 45 | |
| Personalization | Multi-Bench (MB) | Win Rate94.84 | 45 | |
| Multiple-choice Question Answering | MMLU zero-shot (test) | Accuracy (MMLU zero-shot)75 | 27 | |
| Mathematical Reasoning | GSM8K 8-shot (test) | Accuracy93 | 25 | |
| Mathematical Reasoning | SVAMP 8-shot (test) | Accuracy91.67 | 25 | |
| Multiple-choice Question Answering | ARC-Easy zero-shot (test) | Accuracy93.8 | 25 |