Very fast, approximate counterfactual explanations for decision forests
About
We consider finding a counterfactual explanation for a classification or regression forest, such as a random forest. This requires solving an optimization problem to find the closest input instance to a given instance for which the forest outputs a desired value. Finding an exact solution has a cost that is exponential on the number of leaves in the forest. We propose a simple but very effective approach: we constrain the optimization to only those input space regions defined by the forest that are populated by actual data points. The problem reduces to a form of nearest-neighbor search using a certain distance on a certain dataset. This has two advantages: first, the solution can be found very quickly, scaling to large forests and high-dimensional data, and enabling interactive use. Second, the solution found is more likely to be realistic in that it is guided towards high-density areas of input space.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Counterfactual Explanations | Breast-Cancer (BC) | T02.9 | 4 | |
| Counterfactual Explanations | PD | T06 | 4 | |
| Counterfactual Explanations | COMPAS CP | T01.4 | 4 | |
| Counterfactual Explanations | FI | T021.4 | 4 |