Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-Making
About
Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions change who appears in the future, yielding non-stationary data and potentially amplifying disparities. We study online learning of a one-dimensional threshold policy from bandit feedback under demographic parity (DP) and, optionally, service-rate constraints. The learner observes only a scalar score each round and selects a threshold; reward and constraint residuals are revealed only for the chosen threshold. We propose Optimistic Feasible Search (OFS), a simple grid-based method that maintains confidence bounds for reward and constraint residuals for each candidate threshold. At each round, OFS selects a threshold that appears feasible under confidence bounds and, among those, maximizes optimistic reward; if no threshold appears feasible, OFS selects the threshold minimizing optimistic constraint violation. This design directly targets feasible high-utility thresholds and is particularly effective for low-dimensional, interpretable policy classes where discretization is natural. We evaluate OFS on (i) a synthetic closed-loop benchmark with stable contraction dynamics and (ii) two semi-synthetic closed-loop benchmarks grounded in German Credit and COMPAS, constructed by training a score model and feeding group-dependent acceptance decisions back into population composition. Across all environments, OFS achieves higher reward with smaller cumulative constraint violation than unconstrained and primal-dual bandit baselines, and is near-oracle relative to the best feasible fixed threshold under the same sweep procedure. Experiments are reproducible and organized with double-blind-friendly relative outputs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sequential Decision Making | MVE-S tail mean over the last 200 iterations | Reward0.3068 | 3 | |
| Sequential Decision Making | German tail mean over the last 200 iterations | Reward0.4546 | 3 | |
| Sequential Decision Making | COMPAS | Reward0.7556 | 3 | |
| Constrained Online Selection | MVE-S (ε = 0.06) (steady-state) | Oracle Reward31.68 | 3 | |
| Constrained Online Selection | German ε = 0.02, a ∈ [0.30, 0.99] (steady-state) | Oracle Reward0.4656 | 3 | |
| Constrained Online Selection | COMPAS (ε = 0.03, a ∈ [0.30, 0.99]) (steady-state) | Oracle Reward0.7665 | 3 |