Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Do Fair Models Reason Fairly? Counterfactual Explanation Consistency for Procedural Fairness in Credit Decisions

About

Machine learning algorithms in socially sensitive domains (e.g., credit decisions) often focus on equalizing predictive outcomes. However, satisfying these metrics does not guarantee that models use the same reasoning for different groups. We show that existing outcome-fair models can still apply fundamentally different reasoning to individuals, a ``hidden procedural bias'' missed by standard fairness metrics and algorithms. We propose Counterfactual Explanation Consistency (CEC), a framework that detects and mitigates this bias by aligning feature attributions between individuals and their counterfactual counterparts. Key contributions include a nearest-neighbor counterfactual generation method, a modified baseline for integrated gradient comparisons, an individual-level procedural fairness metric, and a corresponding training loss. We introduce a taxonomy identifying ``Regime B'' (same outcome, different reasoning) as a critical blind spot. Experiments on synthetic data, German Credit, Adult Income, and HMDA mortgage data demonstrate that outcome-fair baselines exhibit substantial hidden bias, while CEC substantially reduces it with modest utility cost.

Gideon Popoola, John Sheppard• 2026

Related benchmarks

TaskDatasetResultRank
Fair ClassificationGerman Credit (5-fold)
Pareto Non-Dominance Count5
7
Fair ClassificationSynthetic (5-fold)
Pareto Non-Dominance Count5
7
Fair ClassificationAdult Income (5-fold)
Pareto Non-Dominance Count5
7
Fair ClassificationHMDA (5-fold)
Pareto Non-Dominance Count5
7
ClassificationGerman Credit
AUC69.3
7
ClassificationAdult Income
AUC0.876
7
Showing 6 of 6 rows

Other info

Follow for update