When More Experts Hurt: Underfitting in Multi-Expert Learning to Defer

About

Learning to Defer (L2D) enables a classifier to abstain from predictions and defer to an expert, and has recently been extended to multi-expert settings. In this work, we show that multi-expert L2D is fundamentally more challenging than the single-expert case. With multiple experts, the classifier's underfitting becomes inherent, which seriously degrades prediction performance, whereas in the single-expert setting it arises only under specific conditions. We theoretically reveal that this stems from an intrinsic expert identifiability issue: learning which expert to trust from a diverse pool, a problem absent in the single-expert case and renders existing underfitting remedies failed. To tackle this issue, we propose PiCCE (Pick the Confident and Correct Expert), a surrogate-based method that adaptively identifies a reliable expert based on empirical evidence. PiCCE effectively reduces multi-expert L2D to a single-expert-like learning problem, thereby resolving multi expert underfitting. We further prove its statistical consistency and ability to recover class probabilities and expert accuracies. Extensive experiments across diverse settings, including real-world expert scenarios, validate our theoretical results and demonstrate improved performance.

Shuqi Liu, Yuzhou Cao, Lei Feng, Bo An, Luke Ong• 2026

Related benchmarks

Task	Dataset	Result
Learning to Defer	CIFAR-10H (test)	Coverage40	25
Learning to Defer	CIFAR-10 with redundant synthetic experts	System Accuracy86	21
Classification with expert deferral	CIFAR-10 redundant expert suite (val)	System Accuracy86	21
Learning to Defer	CIFAR-100 Animal Expert	Error Rate18.11	20
Learning to Defer	CIFAR-100 Overlapped Animal Expert	Error Rate15.1	20
Learning to Defer	ImageNet Dog Expert	Error Rate41.32	20
Learning to Defer	ImageNet Overlapped Dog Expert	Error Rate41.23	20
Learning to Defer	CIFAR-100 varying-accuracy synthetic expert (test)	Error Rate18.09	20
Learning to Defer	ImageNet varying-accuracy synthetic expert (val)	Error41.18	20
Learning to Defer	CIFAR-10H	System Accuracy95.7	18

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord