Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-Making

About

Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions change who appears in the future, yielding non-stationary data and potentially amplifying disparities. We study online learning of a one-dimensional threshold policy from bandit feedback under demographic parity (DP) and, optionally, service-rate constraints. The learner observes only a scalar score each round and selects a threshold; reward and constraint residuals are revealed only for the chosen threshold. We propose Optimistic Feasible Search (OFS), a simple grid-based method that maintains confidence bounds for reward and constraint residuals for each candidate threshold. At each round, OFS selects a threshold that appears feasible under confidence bounds and, among those, maximizes optimistic reward; if no threshold appears feasible, OFS selects the threshold minimizing optimistic constraint violation. This design directly targets feasible high-utility thresholds and is particularly effective for low-dimensional, interpretable policy classes where discretization is natural. We evaluate OFS on (i) a synthetic closed-loop benchmark with stable contraction dynamics and (ii) two semi-synthetic closed-loop benchmarks grounded in German Credit and COMPAS, constructed by training a score model and feeding group-dependent acceptance decisions back into population composition. Across all environments, OFS achieves higher reward with smaller cumulative constraint violation than unconstrained and primal-dual bandit baselines, and is near-oracle relative to the best feasible fixed threshold under the same sweep procedure. Experiments are reproducible and organized with double-blind-friendly relative outputs.

Wenzhang Du• 2025

Related benchmarks

TaskDatasetResultRank
Sequential Decision MakingMVE-S tail mean over the last 200 iterations
Reward0.3068
3
Sequential Decision MakingGerman tail mean over the last 200 iterations
Reward0.4546
3
Sequential Decision MakingCOMPAS
Reward0.7556
3
Constrained Online SelectionMVE-S (ε = 0.06) (steady-state)
Oracle Reward31.68
3
Constrained Online SelectionGerman ε = 0.02, a ∈ [0.30, 0.99] (steady-state)
Oracle Reward0.4656
3
Constrained Online SelectionCOMPAS (ε = 0.03, a ∈ [0.30, 0.99]) (steady-state)
Oracle Reward0.7665
3
Showing 6 of 6 rows

Other info

Follow for update