Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning

About

Real-world decision-making, from tax compliance assessment to medical diagnosis, requires aggregating multiple noisy and potentially contradictory evidence sources. Existing approaches either lack explicit uncertainty quantification (neural aggregation methods) or rely on manually engineered discrete predicates (probabilistic logic frameworks), limiting scalability to unstructured data. We introduce Latent Posterior Factors (LPF), a framework that transforms Variational Autoencoder (VAE) latent posteriors into soft likelihood factors for Sum-Product Network (SPN) inference, enabling tractable probabilistic reasoning over unstructured evidence while preserving calibrated uncertainty estimates. We instantiate LPF as LPF-SPN (structured factor-based inference) and LPF-Learned (end-to-end learned aggregation), enabling a principled comparison between explicit probabilistic reasoning and learned aggregation under a shared uncertainty representation. Across eight domains (seven synthetic and the FEVER benchmark), LPF-SPN achieves high accuracy (up to 97.8%), low calibration error (ECE 1.4%), and strong probabilistic fit, substantially outperforming evidential deep learning, LLMs and graph-based baselines over 15 random seeds. Contributions: (1) A framework bridging latent uncertainty representations with structured probabilistic reasoning. (2) Dual architectures enabling controlled comparison of reasoning paradigms. (3) Reproducible training methodology with seed selection. (4) Evaluation against EDL, BERT, R-GCN, and large language model baselines. (5) Cross-domain validation. (6) Formal guarantees in a companion paper.

Aliyu Agboola Alege• 2026

Related benchmarks

TaskDatasetResultRank
Compliance ClassificationCompliance domain (test)
Accuracy97.8
13
Fact VerificationFEVER (test)
Accuracy99.7
10
Multi-Evidence AggregationCompliance
Accuracy97.8
9
Fact VerificationFEVER
Accuracy99.7
7
Compliance evaluationCompliance domain (test)
Accuracy97.8
5
Cross-domain generalizationCompliance (test)
Accuracy97.8
4
Cross-domain generalizationFEVER (test)
Accuracy99.7
4
Cross-domain generalizationConstruction (test)
Accuracy100
4
Cross-domain generalizationHealthcare (test)
Accuracy99.3
4
Cross-domain generalizationAcademic (test)
Accuracy100
3
Showing 10 of 13 rows

Other info

GitHub

Follow for update