TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

About

We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a numerically stable algorithm for computing Beta Shapley values with integral parameters. In particular, the feature scores computed by TreeGrad-Ranker satisfy all the axioms uniquely characterizing probabilistic values, except for linearity, which itself leads to the established unreliability. Empirically, we demonstrate that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap when computing the Shapley value. As a by-product, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values. In our experiments, TreeGrad-Ranker performs significantly better on both insertion and deletion metrics. Our code is available at https://github.com/watml/TreeGrad.

Weida Li, Yaoliang Yu, Bryan Kian Hsiang Low• 2026

Related benchmarks

Task	Dataset	Result
First-order Shapley value calculation	CalHousing small setting	Runtime (s)0.186	6
First-order Shapley value calculation	CalHousing large setting	Runtime (s)1.08e+3	6
First-order Shapley value calculation	CalHousing (sparse setting)	Runtime (s)14.772	6
First-order Shapley value calculation	CovType small setting	Runtime (s)0.091	6
First-order Shapley value calculation	CovType large setting	Runtime (s)3.881	6
First-order Shapley value calculation	CovType sparse setting	Runtime (s)0.679	6
First-order Shapley value calculation	Fashion-MNIST small setting	Runtime (seconds)1.97	6
First-order Shapley value calculation	Fashion-MNIST large setting	Runtime (s)738.3	6
First-order Shapley value calculation	Fashion-MNIST sparse setting	Runtime (s)104.5	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord