Do Fair Models Reason Fairly? Counterfactual Explanation Consistency for Procedural Fairness in Credit Decisions

About

Machine learning algorithms in socially sensitive domains (e.g., credit decisions) often focus on equalizing predictive outcomes. However, satisfying these metrics does not guarantee that models use the same reasoning for different groups. We show that existing outcome-fair models can still apply fundamentally different reasoning to individuals, a ``hidden procedural bias'' missed by standard fairness metrics and algorithms. We propose Counterfactual Explanation Consistency (CEC), a framework that detects and mitigates this bias by aligning feature attributions between individuals and their counterfactual counterparts. Key contributions include a nearest-neighbor counterfactual generation method, a modified baseline for integrated gradient comparisons, an individual-level procedural fairness metric, and a corresponding training loss. We introduce a taxonomy identifying ``Regime B'' (same outcome, different reasoning) as a critical blind spot. Experiments on synthetic data, German Credit, Adult Income, and HMDA mortgage data demonstrate that outcome-fair baselines exhibit substantial hidden bias, while CEC substantially reduces it with modest utility cost.

Gideon Popoola, John Sheppard• 2026

Related benchmarks

Task	Dataset	Result
Fair Classification	German Credit (5-fold)	Pareto Non-Dominance Count5	7
Fair Classification	Synthetic (5-fold)	Pareto Non-Dominance Count5	7
Fair Classification	Adult Income (5-fold)	Pareto Non-Dominance Count5	7
Fair Classification	HMDA (5-fold)	Pareto Non-Dominance Count5	7
Classification	German Credit	AUC69.3	7
Classification	Adult Income	AUC0.876	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord