Adversarial random forests for density estimation and generative modeling
About
We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which trees gradually learn structural properties of the data through alternating rounds of generation and discrimination. The method is provably consistent under minimal assumptions. Unlike classic tree-based alternatives, our approach provides smooth (un)conditional densities and allows for fully synthetic data generation. We achieve comparable or superior performance to state-of-the-art probabilistic circuits and deep learning models on various tabular data benchmarks while executing about two orders of magnitude faster on average. An accompanying $\texttt{R}$ package, $\texttt{arf}$, is available on $\texttt{CRAN}$.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tabular Data Synthesis Fidelity | biodeg | KS Statistic (Mean)0.55 | 90 | |
| Tabular Data Synthesis Fidelity | steel | KS Statistic (Mean)0.64 | 90 | |
| Tabular Data Synthesis Fidelity | PROTEIN | Mean KS Statistic0.74 | 88 | |
| Tabular Data Synthesis Fidelity | fourier | KS Fidelity0.75 | 88 | |
| Tabular Data Synthesis Fidelity | Texture | KS Statistic (Mean)0.9 | 64 | |
| Tabular Data Synthesis | fourier | Chi-squared Result0.01 | 48 | |
| Tabular Data Synthesis | biodeg | Chi-Squared Test Result0.05 | 47 | |
| Tabular Data Synthesis | steel | Chi-squared Test Result0.14 | 47 | |
| Classification | biodeg | Balanced Accuracy78.78 | 45 | |
| Classification | steel | Balanced Accuracy66.33 | 45 |