Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking

About

Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20$\times$ more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.

Yujin Park, Haejun Chung, Ikbeom Jang• 2026

Related benchmarks

TaskDatasetResultRank
Pairwise RankingEyePACS, DHCI, and TAD66k average
Average Human Annotation Count400
12
Visual rankingEyePACS
Spearman Correlation (Sp)0.86
4
Visual rankingHistorical DHCI
Spearman Correlation0.6
4
Visual rankingAesthetics TAD66k
Spearman Correlation0.47
4
Showing 4 of 4 rows

Other info

Follow for update