Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

About

Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of exact-enumeration approaches is expectation computation by explicit summation over the full tuple space $\cX$, which becomes infeasible for more than $K \approx 20$ categorical attributes; sampling-based alternatives exist but rely on Metropolis-type schemes that require proposal tuning and rejection steps. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of $N$ synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising $\cX$. We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a $K{=}15$ Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across $K \in \{12, 20, 30, 40, 50\}$ confirm that GibbsPCDSolver maintains $\MRE \in [0.010, 0.018]$ while $|\cX|$ grows eighteen orders of magnitude, with runtime scaling as $O(K)$ rather than $O(|\cX|)$. On Syn-ISTAT, GibbsPCDSolver reaches $\MRE{=}0.03$ on training constraints and -- crucially -- produces populations with effective sample size $\Neff = N$ versus $\Neff \approx 0.012\,N$ for generalised raking, an $86.8{\times}$ diversity advantage that is essential for agent-based urban simulations.

Mirko Degli Esposti• 2026

Related benchmarks

TaskDatasetResultRank
Maximum Entropy ModelingA2 scaling experiments ternary constraints, N=100,000, s=5
Maximum Relative Error (MRE)0.01
10
Population Synthesis Diversity AnalysisSyn-ISTAT (N=100,000, K=15)
Neff1.00e+5
2
Population SynthesisSyn-ISTAT A-ISTAT-2 (train)
MRE1.8
2
Population SynthesisSyn-ISTAT A-ISTAT-2 held-out (test)
MRE T10.349
2
Showing 4 of 4 rows

Other info

Follow for update