Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Minimax Rates and Spectral Distillation for Tree Ensembles

About

Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms, with two main contributions. First, we derive minimax-optimal convergence for RF regression, showing that, under mild regularity conditions on tree growth, the eigenvalue decay of the induced kernel operator governs the statistical rate. Second, we exploit this spectral viewpoint to develop compression schemes for tree ensembles. For RFs, leading eigenfunctions of the kernel operator capture the dominant predictive directions; for GBMs, leading singular vectors of the smoother matrix play an analogous role. Learning nonlinear maps for these spectral representations yields distilled models that are orders of magnitude smaller than the originals while maintaining competitive predictive performance. Our methods compare favorably to state of the art algorithms for forest pruning and rule extraction, with applications to resource constrained computing.

Binh Duc Vu, David S. Watson• 2026

Related benchmarks

TaskDatasetResultRank
ClassificationAdult
Accuracy85
86
ClassificationChurn
Accuracy86
59
RegressionCalifornia
R2 Score75
49
Classificationbanknote
Accuracy100
32
ClassificationSpambase
Accuracy94
28
Model Compressionbanknote
Accuracy / R2100
26
Model CompressionDiabetes
Accuracy / R2 Score80
26
Model CompressionBoston
Accuracy / R2 Score84.2
26
Model CompressionCalifornia
Accuracy/R279.2
26
RegressionDiabetes dataset
R20.8
17
Showing 10 of 52 rows

Other info

Follow for update