I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning

About

Real-world decision-making, from tax compliance assessment to medical diagnosis, requires aggregating multiple noisy and potentially contradictory evidence sources. Existing approaches either lack explicit uncertainty quantification (neural aggregation methods) or rely on manually engineered discrete predicates (probabilistic logic frameworks), limiting scalability to unstructured data. We introduce Latent Posterior Factors (LPF), a framework that transforms Variational Autoencoder (VAE) latent posteriors into soft likelihood factors for Sum-Product Network (SPN) inference, enabling tractable probabilistic reasoning over unstructured evidence while preserving calibrated uncertainty estimates. We instantiate LPF as LPF-SPN (structured factor-based inference) and LPF-Learned (end-to-end learned aggregation), enabling a principled comparison between explicit probabilistic reasoning and learned aggregation under a shared uncertainty representation. Across eight domains (seven synthetic and the FEVER benchmark), LPF-SPN achieves high accuracy (up to 97.8%), low calibration error (ECE 1.4%), and strong probabilistic fit, substantially outperforming evidential deep learning, LLMs and graph-based baselines over 15 random seeds. Contributions: (1) A framework bridging latent uncertainty representations with structured probabilistic reasoning. (2) Dual architectures enabling controlled comparison of reasoning paradigms. (3) Reproducible training methodology with seed selection. (4) Evaluation against EDL, BERT, R-GCN, and large language model baselines. (5) Cross-domain validation. (6) Formal guarantees in a companion paper.

Aliyu Agboola Alege• 2026

Related benchmarks

Task	Dataset	Result
Compliance Classification	Compliance domain (test)	Accuracy97.8	13
Fact Verification	FEVER (test)	Accuracy99.7	10
Multi-Evidence Aggregation	Compliance	Accuracy97.8	9
Fact Verification	FEVER	Accuracy99.7	7
Compliance evaluation	Compliance domain (test)	Accuracy97.8	5
Cross-domain generalization	Compliance (test)	Accuracy97.8	4
Cross-domain generalization	FEVER (test)	Accuracy99.7	4
Cross-domain generalization	Construction (test)	Accuracy100	4
Cross-domain generalization	Healthcare (test)	Accuracy99.3	4
Cross-domain generalization	Academic (test)	Accuracy100	3

Showing 10 of 13 rows

Other info

GitHub

Follow for update

@wizwand_team Discord