Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

About

We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a numerically stable algorithm for computing Beta Shapley values with integral parameters. In particular, the feature scores computed by TreeGrad-Ranker satisfy all the axioms uniquely characterizing probabilistic values, except for linearity, which itself leads to the established unreliability. Empirically, we demonstrate that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap when computing the Shapley value. As a by-product, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values. In our experiments, TreeGrad-Ranker performs significantly better on both insertion and deletion metrics. Our code is available at https://github.com/watml/TreeGrad.

Weida Li, Yaoliang Yu, Bryan Kian Hsiang Low• 2026

Related benchmarks

TaskDatasetResultRank
First-order Shapley value calculationCalHousing small setting
Runtime (s)0.186
6
First-order Shapley value calculationCalHousing large setting
Runtime (s)1.08e+3
6
First-order Shapley value calculationCalHousing (sparse setting)
Runtime (s)14.772
6
First-order Shapley value calculationCovType small setting
Runtime (s)0.091
6
First-order Shapley value calculationCovType large setting
Runtime (s)3.881
6
First-order Shapley value calculationCovType sparse setting
Runtime (s)0.679
6
First-order Shapley value calculationFashion-MNIST small setting
Runtime (seconds)1.97
6
First-order Shapley value calculationFashion-MNIST large setting
Runtime (s)738.3
6
First-order Shapley value calculationFashion-MNIST sparse setting
Runtime (s)104.5
5
Showing 9 of 9 rows

Other info

Follow for update