Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

About

Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We present $\methodprop$, a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality-constrained maximization of a \emph{super-additive} objective, which is generally intractable to approximate efficiently. To address this, we propose a principled reformulation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this reformulation enables a greedy algorithm with a $(1-1/e)$ approximation guarantee relative to the original super-additive maximization problem. Empirically, we showcase that enforcing uniform prototype weights in UniPROT consistently improves minority-class representation in imbalanced classification benchmarks without compromising majority-class accuracy. In both finetuning and pretraining regimes for large language models under domain imbalance, UniPROT enforces uniform source contributions, yielding robust performance gains. Our results establish UniPROT as a scalable, theoretically grounded solution for uniform-weighted prototype selection. Our code is publicly available at GitHub\footnote{Code: https://github.com/efficiency-learning/UniPROT}

Prateek Chanda, Prayas Agrawal, Karthik S. Gurumoorthy, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH
Accuracy16.8
882
Mathematical ReasoningSVAMP
Accuracy86.2
403
Mathematical ReasoningMathematics
Accuracy37.2
46
Mathematical ReasoningSIMULEQ
Accuracy68.28
30
Mathematical ReasoningNUMGLUE
Accuracy68.8
30
Mathematical ReasoningMATH
Accuracy38.4
26
Mathematical ReasoningMATHINSTRUCT Evaluation Suite In-domain and Out-of-domain
GSM8K Accuracy54.89
11
Natural Language UnderstandingSuperGLUE
Accuracy (SST2)94.65
6
Showing 8 of 8 rows

Other info

Follow for update