Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-Making

About

Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions change who appears in the future, yielding non-stationary data and potentially amplifying disparities. We study online learning of a one-dimensional threshold policy from bandit feedback under demographic parity (DP) and, optionally, service-rate constraints. The learner observes only a scalar score each round and selects a threshold; reward and constraint residuals are revealed only for the chosen threshold. We propose Optimistic Feasible Search (OFS), a simple grid-based method that maintains confidence bounds for reward and constraint residuals for each candidate threshold. At each round, OFS selects a threshold that appears feasible under confidence bounds and, among those, maximizes optimistic reward; if no threshold appears feasible, OFS selects the threshold minimizing optimistic constraint violation. This design directly targets feasible high-utility thresholds and is particularly effective for low-dimensional, interpretable policy classes where discretization is natural. We evaluate OFS on (i) a synthetic closed-loop benchmark with stable contraction dynamics and (ii) two semi-synthetic closed-loop benchmarks grounded in German Credit and COMPAS, constructed by training a score model and feeding group-dependent acceptance decisions back into population composition. Across all environments, OFS achieves higher reward with smaller cumulative constraint violation than unconstrained and primal-dual bandit baselines, and is near-oracle relative to the best feasible fixed threshold under the same sweep procedure. Experiments are reproducible and organized with double-blind-friendly relative outputs.

Wenzhang Du• 2025

Related benchmarks

Task	Dataset	Result
Sequential Decision Making	MVE-S tail mean over the last 200 iterations	Reward0.3068	3
Sequential Decision Making	German tail mean over the last 200 iterations	Reward0.4546	3
Sequential Decision Making	COMPAS	Reward0.7556	3
Constrained Online Selection	MVE-S (ε = 0.06) (steady-state)	Oracle Reward31.68	3
Constrained Online Selection	German ε = 0.02, a ∈ [0.30, 0.99] (steady-state)	Oracle Reward0.4656	3
Constrained Online Selection	COMPAS (ε = 0.03, a ∈ [0.30, 0.99]) (steady-state)	Oracle Reward0.7665	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord