Efficient Distribution Learning with Error Bounds in Wasserstein Distance
About
The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory, decision theory, and biological systems. Consequently, learning an unknown distribution with non-asymptotic and easy-to-compute error bounds in Wasserstein distance has become a fundamental problem in many fields. In this paper, we devise a novel algorithmic and theoretical framework to approximate an unknown probability distribution $\mathbb{P}$ from a finite set of samples by an approximate discrete distribution $\widehat{\mathbb{P}}$ while bounding the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$. Our framework leverages optimal transport, nonlinear optimization, and concentration inequalities. In particular, we show that, even if $\mathbb{P}$ is unknown, the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$ can be efficiently bounded with high confidence by solving a tractable optimization problem (a mixed integer linear program) of a size that only depends on the size of the support of $\widehat{\mathbb{P}}$. This enables us to develop intelligent clustering algorithms to optimally find the support of $\widehat{\mathbb{P}}$ while minimizing the Wasserstein distance error. On a set of benchmarks, we demonstrate that our approach outperforms state-of-the-art comparable methods by generally returning approximating distributions with substantially smaller support and tighter error bounds.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Distribution-learning | Gaussian distribution rho = 2 (test) | Wasserstein Distance Error0.131 | 48 | |
| Distribution-learning | Gaussian distribution rho = 2 | Wasserstein Distance Error0.131 | 48 | |
| Distribution-learning | Uniform distribution rho = 1 | Wasserstein Distance Error0.025 | 48 | |
| Wasserstein distance error bound estimation | Emissions real-world | -- | 1 | |
| Wasserstein distance error bound estimation | Miniboone | -- | 1 | |
| Wasserstein distance error bound estimation | OCTMNIST | -- | 1 |