posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms

About

The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of representative target models. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for model evaluation and comparison. To provide a wide range of realistic target densities, posteriordb currently comprises 120 representative models and has been instrumental in developing several general inference algorithms.

M{\aa}ns Magnusson, Jakob Torgander, Paul-Christian B\"urkner, Lu Zhang, Bob Carpenter, Aki Vehtari• 2024

Related benchmarks

Task	Dataset	Result
Probabilistic Program Generation	Peregrine PosteriorDB	ELPD LOO-112.6	4
Probabilistic Program Generation	Eight schools PosteriorDB	ELPD LOO-30.7	4
Probabilistic Program Generation	Dugongs PosteriorDB	ELPD LOO22.43	4
Probabilistic Program Generation	PosteriorDB Surgical	ELPD LOO-39.73	4
Probabilistic Program Generation	PosteriorDB GP	ELPD LOO-26.53	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord