Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles
About
Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a "free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Pruning Boosted Tree Ensembles | Adult | Pruning Rate21.3 | 7 | |
| Pruning Boosted Tree Ensembles | Balance Scale | Pruning Rate71.3 | 7 | |
| Pruning Boosted Tree Ensembles | Breast Cancer Wisconsin | Pruning Rate26 | 7 | |
| Pruning Boosted Tree Ensembles | COMPAS-ProPublica | Pruning Rate58 | 7 | |
| Pruning Boosted Tree Ensembles | elec2 | Pruning Rate30 | 7 | |
| Pruning Boosted Tree Ensembles | FICO | Pruning Rate26 | 7 | |
| Pruning Boosted Tree Ensembles | HTRU2 | Pruning Rate62 | 7 | |
| Pruning Boosted Tree Ensembles | JM1 | Pruning Rate6 | 7 | |
| Pruning Boosted Tree Ensembles | Pima Diabetes | Pruning Rate17.3 | 7 | |
| Pruning Boosted Tree Ensembles | POL | Pruning Rate52 | 7 |