Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

About

Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variable dependence. We propose a neural framework for estimating pairwise conditional mutual information (MI) directly from the hidden states of a pretrained MDM, using ground-truth MI computed from the model's own conditional distributions for supervision. The resulting estimator captures the model's internal belief about dependency structure and predicts the full MI matrix in a single forward pass, enabling MI-guided parallel decoding by identifying conditionally independent subsets of variables. We evaluate our approach on Sudoku and protein sequence generation with ESM-C, where the MI maps recover known structural constraints and enable a 3-5x magnitude reduction in inference-time forward passes compared to sequential decoding, while preserving generative quality and outperforming entropy-based parallelization methods.

Jai Sharma, Yifan Wang, Bryan Li• 2026

Related benchmarks

TaskDatasetResultRank
Sudoku Generation1000 unseen hard Sudoku puzzles (test)
Average Passes15.2
7
Protein Sequence GenerationESM-C
Average Passes15.3
6
Showing 2 of 2 rows

Other info

Follow for update