Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Identity-Free Deferral For Unseen Experts

About

Learning to Defer (L2D) improves AI reliability in decision-critical environments by training AI to either make its own prediction or defer the decision to a human expert. A key challenge is adapting to unseen experts at test time, whose competence can differ from the training population. Current methods for this task, however, can falter when unseen experts are out-of-distribution (OOD) relative to the training population. We identify a core architectural flaw as the cause: they learn identity-conditioned policies by processing class-indexed signals in fixed coordinates, creating shortcuts that violate the problem's inherent permutation symmetry. We introduce Identity-Free Deferral (IFD), an architecture that enforces this symmetry by construction. From a few-shot context, IFD builds a query-independent Bayesian competence profile for each expert. It then supplies the deferral rejector with a low-dimensional, role-indexed state containing only structural information, such as the model's confidence in its top-ranked class and the expert's estimated skill for that same role, which obscures absolute class identities. We train IFD using an uncertainty-aware, context-only objective that removes the need for expensive query-time expert labels. We formally prove the permutation invariance of our approach, contrasting it with the generic non-invariance of standard population encoders. Experiments on medical imaging benchmarks and ImageNet-16H with real human annotators show that IFD consistently improves generalisation to unseen experts, with gains in OOD settings, all while using fewer annotations than alternative methods.

Joshua Strong, Pramit Saha, Yasin Ibrahim, Cheng Ouyang, Alison Noble• 2025

Related benchmarks

TaskDatasetResultRank
Learning to DeferImageNet-16H ID
SAC76
12
Learning to DeferCifar100 Rapid Fatigue (test)
AUACC63.97
10
Learning to DeferCifar100 Sustained High Performance (test)
AU Accuracy73.46
10
Learning to DeferCifar100 Normal Fatigue (test)
AUACC67.87
10
Learning to DeferImageNet 16H OOD
SAC76
9
Learning to DeferHAM10000 ID
SAC86
8
Learning to DeferBlood Cells ID
SAC89
8
Learning to DeferLiver tumours ID
SAC88
8
Learning to DeferHAM10000 OOD
SAC86
6
Learning to DeferBlood Cells OOD
SAC89
6
Showing 10 of 11 rows

Other info

Follow for update