Interpretable epistemic uncertainty decomposition in sequential generative models via polynomial chaos surrogates

About

Sequential generative models conditioned on uncertain rewards are central to AI-driven scientific discovery, yet the epistemic uncertainty they inherit from imperfect reward estimates remains unquantified. We propagate this uncertainty through generative flow networks (GFlowNets) by fitting polynomial chaos expansions (PCEs) to small ensembles of trained models. The PCE coefficients yield analytical Sobol sensitivity indices, providing the first interpretable decomposition of which reward components drive which generative decisions, a capability unavailable from deep ensembles, Bayesian neural networks, or Monte Carlo dropout. Convergence guarantees are established theoretically and four of five are formally verified in the Lean 4 proof assistant. Across three real-world tasks the framework reveals actionable structure invisible to ensembles alone. On the Doyle-Dreher Buchwald-Hartwig dataset catalyst selection is robust ($D_{\mathrm{catalyst}}\approx 71$) while additive selection is fragile ($D_{\mathrm{additive}}\approx 179$, $2.5\times$ higher). In fragment-based molecular design the linker position is the most sensitive ($D_{\mathrm{linker}}\approx 28$) while decoration positions are the most robust ($D\approx 14$-$18$), reversing the conventional scaffold-robust / decoration-fragile assumption. On the Sachs protein signalling network, MAPK-cascade edges and PKA/PKC hub edges separate into distinct sensitivity regimes, providing a targeted map for perturbation experiments. Calibration coverage at the 95% level reaches 0.97-1.00 across the dominant steps, and the surrogate evaluates 10{,}000 policy samples in milliseconds - $10^{3}$-$10^{4}\times$ faster than exhaustive retraining.

Ram\'on Nartallo-Kaluarachchi, Shashanka Ubaru, Ma{\l}gorzata J Zimo\'n, Dongsung Huh, Robert Manson-Sawko, Lior Horesh, Yoshua Bengio• 2025

Related benchmarks

Task	Dataset	Result
Surrogate policy prediction	Sachs ensemble (test)	Policy MAE0.016	3
Surrogate Policy Generation	Buchwald-Hartwig	Policy MAE0.153	3
Surrogate Policy Generation	Discrete grid	Ensemble Training Time0.5	1
Surrogate Policy Generation	Continuous grid	Ensemble Training Time (h)0.3	1
Surrogate Policy Generation	Symbolic Regression	Ensemble Training Time (h)0.75	1
Surrogate Policy Generation	LLM GFlowNet	Ensemble Training Time (h)0.5	1
Surrogate Policy Generation	Sachs 11-node	Ensemble Training Time (h)0.5	1

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord