Density-Guided Robust Counterfactual Explanations on Tabular Data under Model Multiplicity
About
Counterfactual explanations (CEs) are essential for actionable recourse, yet their reliability is often compromised in low-density regions, where classifiers exhibit high variance. Unlike existing methods that rely on expensive ensemble intersections to define stability, we propose \textit{DensityFlow}, a generative framework that constructs robust CEs by adhering to the high-confidence data manifold. Specifically, we model the counterfactual generation as continuous-time dynamics parameterized by Neural ODE, guided by a differentiable density score to actively avoid uncertain, low-density areas. This density score is learned via Noise Contrastive Estimation, effectively leveraging a $(K{+}1)$-way discriminator to estimate density ratios. For black-box settings, we introduce a local proxy distillation mechanism that aligns a lightweight surrogate with the target model strictly within the trajectory of CE generation, enabling efficient gradient-based optimization with minimal queries. Experiments demonstrate that \textit{DensityFlow} achieves superior validity under model multiplicity while significantly reducing query costs compared to ensemble-based baselines. Our implementation is available at https://github.com/G-AILab/DensityFlow.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Counterfactual Explanations | COMPAS | Validity72.9 | 21 | |
| Counterfactual Explanations | moons | Validity99.7 | 19 | |
| Counterfactual Explanations | HELOC | Validity75.7 | 19 | |
| Counterfactual Explanation | Adult | Cost1.597 | 5 | |
| Counterfactual Explanations | blood | Cost1.527 | 5 | |
| Counterfactual Explanations | circles | Cost0.683 | 5 | |
| Counterfactual Explanations | Spirals | Cost0.487 | 5 | |
| Counterfactual Explanations | chessboard | Cost1.088 | 5 | |
| Counterfactual Explanation Plausibility | Spirals | LOF Score1.07 | 5 |