Symbolic Density Estimation for Discrete Distributions
About
Discrete probability laws underpin statistical modeling, yet the catalog of interpretable distributions has expanded only gradually through centuries of case-by-case mathematical derivations. We introduce symbolic density estimation (SDE), an unsupervised framework that automatically recovers closed-form probability mass functions by composing elementary analytic operations within a structured search space. Our method integrates domain-specific structural priors with evolutionary search and a validity-aware inference stage, and it extends to richer distribution families such as zero inflation and finite mixtures. To support systematic evaluation and future research, we contribute a benchmark dataset spanning a broad collection of commonly used discrete distributions. The proposed algorithm recovers all benchmark families with accurate parameter estimates. A real data application shows that it identifies concise and interpretable mixture models that improve goodness-of-fit over standard models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Symbolic Density Estimation | PBMC gene 4046 | MSE0.1263 | 6 | |
| Parameter Estimation | Poisson Distribution (Synthetic) | Estimated Lambda (λ)12.01 | 3 | |
| Parameter Estimation | Beta-Binomial Distribution Synthetic | Parameter alpha1.98 | 3 | |
| PMF Estimation | Beta-Binomial distribution | Max Error (%)0.81 | 3 | |
| Parameter Estimation | Binomial Distribution Synthetic | Estimated p (Probability)30 | 3 | |
| Parameter Estimation | Geometric Distribution Synthetic | Parameter p30 | 3 | |
| Parameter Estimation | Negative Binomial Distribution Synthetic | Parameter r9.99 | 3 | |
| PMF Estimation | Poisson distribution | Max Error10 | 3 | |
| PMF Estimation | Binomial distribution | Max Error18 | 3 | |
| PMF Estimation | Geometric distribution | Max Error (%)26 | 3 |